Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Karan Sikka is active.

Publication


Featured researches published by Karan Sikka.


international conference on computer vision | 2012

Exploring bag of words architectures in the facial expression domain

Karan Sikka; Tingfan Wu; Joshua Susskind; Marian Stewart Bartlett

Automatic facial expression recognition (AFER) has undergone substantial advancement over the past two decades. This work explores the application of bag of words (BoW), a highly matured approach for object and scene recognition to AFER. We proceed by first highlighting the reasons that makes the task for BoW differ for AFER compared to object and scene recognition. We propose suitable extensions to BoW architecture for the AFERs task. These extensions are able to address some of the limitations of current state of the art appearance-based approaches to AFER. Our BoW architecture is based on the spatial pyramid framework, augmented by multiscale dense SIFT features, and a recently proposed approach for object classification: locality-constrained linear coding and max-pooling. Combining these, we are able to achieve a powerful facial representation that works well even with linear classifiers. We show that a well designed BoW architecture can provide a performance benefit for AFER, and elements of the proposed BoW architecture are empirically evaluated. The proposed BoW approach supersedes previous state of the art results by achieving an average recognition rate of 96% on AFER for two public datasets.


international conference on multimodal interfaces | 2013

Multiple kernel learning for emotion recognition in the wild

Karan Sikka; Karmen Dykstra; Suchitra Sathyanarayana; Gwen Littlewort; Marian Stewart Bartlett

We propose a method to automatically detect emotions in unconstrained settings as part of the 2013 Emotion Recognition in the Wild Challenge [16], organized in conjunction with the ACM International Conference on Multimodal Interaction (ICMI 2013). Our method combines multiple visual descriptors with paralinguistic audio features for multimodal classification of video clips. Extracted features are combined using Multiple Kernel Learning and the clips are classified using an SVM into one of the seven emotion categories: Anger, Disgust, Fear, Happiness, Neutral, Sadness and Surprise. The proposed method achieves competitive results, with an accuracy gain of approximately 10% above the challenge baseline.


ieee international conference on automatic face gesture recognition | 2015

The more the merrier: Analysing the affect of a group of people in images

Abhinav Dhall; Jyoti Joshi; Karan Sikka; Roland Goecke; Nicu Sebe

The recent advancement of social media has given users a platform to socially engage and interact with a global population. With millions of images being uploaded onto social media platforms, there is an increasing interest in inferring the emotion and mood display of a group of people in images. Automatic affect analysis research has come a long way but has traditionally focussed on a single subject in a scene. In this paper, we study the problem of inferring the emotion of a group of people in an image. This group affect has wide applications in retrieval, advertisement, content recommendation and security. The contributions of the paper are: 1) a novel emotion labelled database of groups of people in images; 2) a Multiple Kernel Learning based hybrid affect inference model; 3) a scene context based affect inference model; 4) a user survey to better understand the attributes that affect the perception of affect of a group of people in an image. The detailed experimentation validation provides a rich baseline for the proposed database.


Pediatrics | 2015

Automated Assessment of Children's Postoperative Pain Using Computer Vision.

Karan Sikka; Alex A. Ahmed; Damaris Diaz; Matthew S. Goodwin; Kenneth D. Craig; Marian Stewart Bartlett; Jeannie S. Huang

BACKGROUND: Current pain assessment methods in youth are suboptimal and vulnerable to bias and underrecognition of clinical pain. Facial expressions are a sensitive, specific biomarker of the presence and severity of pain, and computer vision (CV) and machine-learning (ML) techniques enable reliable, valid measurement of pain-related facial expressions from video. We developed and evaluated a CVML approach to measure pain-related facial expressions for automated pain assessment in youth. METHODS: A CVML-based model for assessment of pediatric postoperative pain was developed from videos of 50 neurotypical youth 5 to 18 years old in both endogenous/ongoing and exogenous/transient pain conditions after laparoscopic appendectomy. Model accuracy was assessed for self-reported pain ratings in children and time since surgery, and compared with by-proxy parent and nurse estimates of observed pain in youth. RESULTS: Model detection of pain versus no-pain demonstrated good-to-excellent accuracy (Area under the receiver operating characteristic curve 0.84–0.94) in both ongoing and transient pain conditions. Model detection of pain severity demonstrated moderate-to-strong correlations (r = 0.65–0.86 within; r = 0.47–0.61 across subjects) for both pain conditions. The model performed equivalently to nurses but not as well as parents in detecting pain versus no-pain conditions, but performed equivalently to parents in estimating pain severity. Nurses were more likely than the model to underestimate youth self-reported pain ratings. Demographic factors did not affect model performance. CONCLUSIONS: CVML pain assessment models derived from automatic facial expression measurements demonstrated good-to-excellent accuracy in binary pain classifications, strong correlations with patient self-reported pain ratings, and parent-equivalent estimation of children’s pain levels over typical pain trajectories in youth after appendectomy.


computer vision and pattern recognition | 2017

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

Amlan Kar; Nishant Rai; Karan Sikka; Gaurav Sharma

We propose a novel method for temporally pooling frames in a video for the task of human action recognition. The method is motivated by the observation that there are only a small number of frames which, together, contain sufficient information to discriminate an action class present in a video, from the rest. The proposed method learns to pool such discriminative and informative frames, while discarding a majority of the non-informative frames in a single temporal scan of the video. Our algorithm does so by continuously predicting the discriminative importance of each video frame and subsequently pooling them in a deep learning framework. We show the effectiveness of our proposed pooling method on standard benchmarks where it consistently improves on baseline pooling methods, with both RGB and optical flow based Convolutional networks. Further, in combination with complementary video representations, we show results that are competitive with respect to the state-of-the-art results on two challenging and publicly available benchmark datasets.


british machine vision conference | 2015

Deep Q-learning for Active Recognition of GERMS: Baseline performance on a standardized dataset for active learning.

Mohsen Malmir; Karan Sikka; Deborah Forster; Javier R. Movellan; Garrison W. Cottrell

Mohsen Malmir1 http://mplab.ucsd.edu/~mmalmir/ Karan Sikka1 http://mplab.ucsd.edu/~ksikka/ Deborah Forster1 [email protected] Javier Movellan2 http://www.emotient.com/ Garrison W. Cottrell3 http://cseweb.ucsd.edu/~gary/ 1 Machine Perception Lab. University of California San Diego, San Diego, CA, USA 2 Emotient, Inc. 4435 Eastgate Mall, Suite 320, San Diego, CA, USA 3 Computer Science and Engineering Dept. University of California San Diego, San Diego, CA, USA


Computer Vision and Image Understanding | 2017

Deep active object recognition by joint label and action prediction

Mohsen Malmir; Karan Sikka; Deborah Forster; Ian R. Fasel; Javier R. Movellan; Garrison W. Cottrell

An active object recognition system has the advantage of being able to act in the environment to capture images that are more suited for training and that lead to better performance at test time. In this paper, we propose a deep convolutional neural network for active object recognition that simultaneously predicts the object label, and selects the next action to perform on the object with the aim of improving recognition performance. We treat active object recognition as a reinforcement learning problem and derive the cost function to train the network for joint prediction of the object label and the action. A generative model of object similarities based on the Dirichlet distribution is proposed and embedded in the network for encoding the state of the system. The training is carried out by simultaneously minimizing the label and action prediction errors using gradient descent. We empirically show that the proposed network is able to predict both the object label and the actions on GERMS, a dataset for active object recognition. We compare the test label prediction accuracy of the proposed model with Dirichlet and Naive Bayes state encoding. The results of experiments suggest that the proposed model equipped with Dirichlet state encoding is superior in performance, and selects images that lead to better training and higher accuracy of label prediction at test time.


computer vision and pattern recognition | 2016

LOMo: Latent Ordinal Model for Facial Analysis in Videos

Karan Sikka; Gaurav Sharma; Marian Stewart Bartlett

We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (e.g. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF - it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.


british machine vision conference | 2015

Joint Clustering and Classification for Multiple Instance Learning.

Karan Sikka; Ritwik Giri; Marian Stewart Bartlett

The Multiple Instance Learning (MIL) framework has been extensively used to solve weakly labeled visual classification problems, where each image or video is treated as a bag of instances. Instance Space based MIL algorithms construct a classifier by modifying standard classifiers by defining the probability that a bag is of the target class as the maximum over the probabilities that its instances are of the target class. Although they are the most commonly used MIL algorithms, they do not account for the possibility that the instances may have multiple intermediate concepts, and that these concepts may have unequal weighting in predicting the overall target class. On the other hand, Embedding-space (ES) based MIL approaches are able to tackle this issue by defining a set of concepts, and then embedding each bag into a concept space, followed by training a standard classifier in the embedding space. In previous ES based approaches, the concepts were discovered separately from the classifier, and thus were not optimized for the final classification task. Here we propose a novel algorithm to estimate concepts and classifier parameters by jointly optimizing a classification loss. This approach discovers a small set of discriminative concepts, which yield superior classification performance. The proposed algorithm is referred to as Joint Clustering Classification for MIL data (JC2MIL) because the discovered concepts induce clusters of data instances. In comparison to previous approaches JC2MIL obtains state-of-the-art results on several MIL datasetsCorel-2000, image annotation datasets (Elephant, Tiger and Fox), and UCSB Breast Cancer dataset.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

Discriminatively Trained Latent Ordinal Model for Video Classification

Karan Sikka; Gaurav Sharma

We address the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (e.g., onset and offset phase for “smile”, running and jumping for “highjump”). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF - it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations, and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.

Collaboration


Dive into the Karan Sikka's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mohsen Malmir

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge