Sarah Adel Bargal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sarah Adel Bargal is active.

Explore More

Publication

Featured researches published by Sarah Adel Bargal.

international conference on multimodal interfaces | 2016

Emotion recognition in the wild from videos using images

Sarah Adel Bargal; Emad Barsoum; Cristian Canton Ferrer; Cha Zhang

This paper presents the implementation details of the proposed solution to the Emotion Recognition in the Wild 2016 Challenge, in the category of video-based emotion recognition. The proposed approach takes the video stream from the audio-video trimmed clips provided by the challenge as input and produces the emotion label corresponding to this video sequence. This output is encoded as one out of seven classes: the six basic emotions (Anger, Disgust, Fear, Happiness, Sad, Surprise) and Neutral. Overall, the system consists of several pipelined modules: face detection, image pre-processing, deep feature extraction, feature encoding and, finally, an SVM classification. This system achieves 59.42% validation accuracy, surpassing the competition baseline of 38.81%. With regard to test data, our system achieves 56.66% recognition rate, also improving the competition baseline of 40.47%.

Pattern Recognition | 2017

Do less and achieve more

Shugao Ma; Sarah Adel Bargal; Jianming Zhang; Leonid Sigal; Stan Sclaroff

We collect three large web action image datasets.We verify that web action images are complementary to training videos by extensive experiments.We show both filtered and unfiltered web action images are complementary to training videos.We show usefulness of web action images in solving an artifact of finetuning CNN model. Recently, attempts have been made to collect millions of videos to train Convolutional Neural Network (CNN) models for action recognition in videos. However, curating such large-scale video datasets requires immense human labor, and training CNNs on millions of videos demands huge computational resources. In contrast, collecting action images from the Web is much easier and training on images requires much less computation. In addition, labeled web images tend to contain discriminative action poses, which highlight discriminative portions of a videos temporal progression. Through extensive experiments, we explore the question of whether we can utilize web action images to train better CNN models for action recognition in videos. We collect 23.8K manually filtered images from the Web that depict the 101 actions in the UCF101 action video dataset. We show that by utilizing web action images along with videos in training, significant performance boosts of CNN models can be achieved. We also investigate the scalability of the process by leveraging crawled web images (unfiltered) for UCF101 and ActivityNet. Using unfiltered images we can achieve performance improvements that are on-par with using filtered images. This means we can further reduce annotation labor and easily scale-up to larger problems. We also shed light on an artifact of finetuning CNN models that reduces the effective parameters of the CNN and show that using web action images can significantly alleviate this problem.

Computer Vision and Image Understanding | 2017

Online supervised hashing

Fatih Cakir; Sarah Adel Bargal; Stan Sclaroff

Fast nearest neighbor search is becoming more and more crucial given the advent of large-scale data in many computer vision applications. Hashing approaches provide both fast search mechanisms and compact index structures to address this critical need. In image retrieval problems where labeled training data is available, supervised hashing methods prevail over unsupervised methods. Most state-of-the-art supervised hashing approaches employ batch-learners. Unfortunately, batch-learning strategies may be inefficient when confronted with large datasets. Moreover, with batch-learners, it is unclear how to adapt the hash functions as the dataset continues to grow and new variations appear over time. To handle these issues, we propose OSH: an Online Supervised Hashing technique that is based on Error Correcting Output Codes. We consider a stochastic setting where the data arrives sequentially and our method learns and adapts its hashing functions in a discriminative manner. Our method makes no assumption about the number of possible class labels, and accommodates new classes as they are presented in the incoming data stream. In experiments with three image retrieval benchmarks, our method yields state-of-the-art retrieval performance as measured in Mean Average Precision, while also being orders-of-magnitude faster than competing batch methods for supervised hashing. Also, our method significantly outperforms recently introduced online hashing solutions.

international conference on computer vision theory and applications | 2015

Image-based Ear Biometric Smartphone App for Patient Identification in Field Settings

Sarah Adel Bargal; Alexander Welles; Cliff R. Chan; Samuel Howes; Stan Sclaroff; Elizabeth J. Ragan; Courtney Johnson; Christopher J. Gill

We present a work in progress of a computer vision application that would directly impact the delivery of healthcare in underdeveloped countries. We describe the development of an image-based smartphone application prototype for ear biometrics. The application targets the public health problem of managing medical records at on-site medical clinics in less developed countries where many individuals do not hold IDs. The domain presents challenges for an ear biometric system, including varying scale, rotation, and illumination. It was not clear which feature descriptors would work best for the application, so a comparative study of three ear biometric extraction techniques was performed, one of which was used to develop an iOS application prototype to establish the identity of an individual using a smartphone camera image. A pilot study was then conducted on the developed application to test feasibility in naturalistic settings.

International Journal of Computer Vision | 2018

Top-Down Neural Attention by Excitation Backprop

Jianming Zhang; Sarah Adel Bargal; Zhe Lin; Jonathan Brandt; Xiaohui Shen; Stan Sclaroff

We aim to model the top-down attention of a convolutional neural network (CNN) classifier for generating task-specific attention maps. Inspired by a top-down human visual attention model, we propose a new backpropagation scheme, called Excitation Backprop, to pass along top-down signals downwards in the network hierarchy via a probabilistic Winner-Take-All process. Furthermore, we introduce the concept of contrastive attention to make the top-down attention maps more discriminative. We show a theoretic connection between the proposed contrastive attention formulation and the Class Activation Map computation. Efficient implementation of Excitation Backprop for common neural network layers is also presented. In experiments, we visualize the evidence of a model’s classification decision by computing the proposed top-down attention maps. For quantitative evaluation, we report the accuracy of our method in weakly supervised localization tasks on the MS COCO, PASCAL VOC07 and ImageNet datasets. The usefulness of our method is further validated in the text-to-region association task. On the Flickr30k Entities dataset, we achieve promising performance in phrase localization by leveraging the top-down attention of a CNN model that has been trained on weakly labeled web images. Finally, we demonstrate applications of our method in model interpretation and data annotation assistance for facial expression analysis and medical imaging tasks.

arXiv: Computer Vision and Pattern Recognition | 2018

Moments in Time Dataset: one million videos for event understanding.

Mathew Monfort; Bolei Zhou; Sarah Adel Bargal; Alex Andonian; Tom Yan; Kandan Ramakrishnan; Lisa M. Brown; Quanfu Fan; Dan Gutfreund; Carl Vondrick; Aude Oliva

international conference on computer vision | 2017