Sarah Adel Bargal
Boston University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sarah Adel Bargal.
international conference on multimodal interfaces | 2016
Sarah Adel Bargal; Emad Barsoum; Cristian Canton Ferrer; Cha Zhang
This paper presents the implementation details of the proposed solution to the Emotion Recognition in the Wild 2016 Challenge, in the category of video-based emotion recognition. The proposed approach takes the video stream from the audio-video trimmed clips provided by the challenge as input and produces the emotion label corresponding to this video sequence. This output is encoded as one out of seven classes: the six basic emotions (Anger, Disgust, Fear, Happiness, Sad, Surprise) and Neutral. Overall, the system consists of several pipelined modules: face detection, image pre-processing, deep feature extraction, feature encoding and, finally, an SVM classification. This system achieves 59.42% validation accuracy, surpassing the competition baseline of 38.81%. With regard to test data, our system achieves 56.66% recognition rate, also improving the competition baseline of 40.47%.
Pattern Recognition | 2017
Shugao Ma; Sarah Adel Bargal; Jianming Zhang; Leonid Sigal; Stan Sclaroff
We collect three large web action image datasets.We verify that web action images are complementary to training videos by extensive experiments.We show both filtered and unfiltered web action images are complementary to training videos.We show usefulness of web action images in solving an artifact of finetuning CNN model. Recently, attempts have been made to collect millions of videos to train Convolutional Neural Network (CNN) models for action recognition in videos. However, curating such large-scale video datasets requires immense human labor, and training CNNs on millions of videos demands huge computational resources. In contrast, collecting action images from the Web is much easier and training on images requires much less computation. In addition, labeled web images tend to contain discriminative action poses, which highlight discriminative portions of a videos temporal progression. Through extensive experiments, we explore the question of whether we can utilize web action images to train better CNN models for action recognition in videos. We collect 23.8K manually filtered images from the Web that depict the 101 actions in the UCF101 action video dataset. We show that by utilizing web action images along with videos in training, significant performance boosts of CNN models can be achieved. We also investigate the scalability of the process by leveraging crawled web images (unfiltered) for UCF101 and ActivityNet. Using unfiltered images we can achieve performance improvements that are on-par with using filtered images. This means we can further reduce annotation labor and easily scale-up to larger problems. We also shed light on an artifact of finetuning CNN models that reduces the effective parameters of the CNN and show that using web action images can significantly alleviate this problem.
Computer Vision and Image Understanding | 2017
Fatih Cakir; Sarah Adel Bargal; Stan Sclaroff
Fast nearest neighbor search is becoming more and more crucial given the advent of large-scale data in many computer vision applications. Hashing approaches provide both fast search mechanisms and compact index structures to address this critical need. In image retrieval problems where labeled training data is available, supervised hashing methods prevail over unsupervised methods. Most state-of-the-art supervised hashing approaches employ batch-learners. Unfortunately, batch-learning strategies may be inefficient when confronted with large datasets. Moreover, with batch-learners, it is unclear how to adapt the hash functions as the dataset continues to grow and new variations appear over time. To handle these issues, we propose OSH: an Online Supervised Hashing technique that is based on Error Correcting Output Codes. We consider a stochastic setting where the data arrives sequentially and our method learns and adapts its hashing functions in a discriminative manner. Our method makes no assumption about the number of possible class labels, and accommodates new classes as they are presented in the incoming data stream. In experiments with three image retrieval benchmarks, our method yields state-of-the-art retrieval performance as measured in Mean Average Precision, while also being orders-of-magnitude faster than competing batch methods for supervised hashing. Also, our method significantly outperforms recently introduced online hashing solutions.
international conference on computer vision theory and applications | 2015
Sarah Adel Bargal; Alexander Welles; Cliff R. Chan; Samuel Howes; Stan Sclaroff; Elizabeth J. Ragan; Courtney Johnson; Christopher J. Gill
We present a work in progress of a computer vision application that would directly impact the delivery of healthcare in underdeveloped countries. We describe the development of an image-based smartphone application prototype for ear biometrics. The application targets the public health problem of managing medical records at on-site medical clinics in less developed countries where many individuals do not hold IDs. The domain presents challenges for an ear biometric system, including varying scale, rotation, and illumination. It was not clear which feature descriptors would work best for the application, so a comparative study of three ear biometric extraction techniques was performed, one of which was used to develop an iOS application prototype to establish the identity of an individual using a smartphone camera image. A pilot study was then conducted on the developed application to test feasibility in naturalistic settings.
International Journal of Computer Vision | 2018
Jianming Zhang; Sarah Adel Bargal; Zhe Lin; Jonathan Brandt; Xiaohui Shen; Stan Sclaroff
We aim to model the top-down attention of a convolutional neural network (CNN) classifier for generating task-specific attention maps. Inspired by a top-down human visual attention model, we propose a new backpropagation scheme, called Excitation Backprop, to pass along top-down signals downwards in the network hierarchy via a probabilistic Winner-Take-All process. Furthermore, we introduce the concept of contrastive attention to make the top-down attention maps more discriminative. We show a theoretic connection between the proposed contrastive attention formulation and the Class Activation Map computation. Efficient implementation of Excitation Backprop for common neural network layers is also presented. In experiments, we visualize the evidence of a model’s classification decision by computing the proposed top-down attention maps. For quantitative evaluation, we report the accuracy of our method in weakly supervised localization tasks on the MS COCO, PASCAL VOC07 and ImageNet datasets. The usefulness of our method is further validated in the text-to-region association task. On the Flickr30k Entities dataset, we achieve promising performance in phrase localization by leveraging the top-down attention of a CNN model that has been trained on weakly labeled web images. Finally, we demonstrate applications of our method in model interpretation and data annotation assistance for facial expression analysis and medical imaging tasks.
arXiv: Computer Vision and Pattern Recognition | 2018
Mathew Monfort; Bolei Zhou; Sarah Adel Bargal; Alex Andonian; Tom Yan; Kandan Ramakrishnan; Lisa M. Brown; Quanfu Fan; Dan Gutfreund; Carl Vondrick; Aude Oliva
international conference on computer vision | 2017
Fatih Cakir; Kun He; Sarah Adel Bargal; Stan Sclaroff
computer vision and pattern recognition | 2018
Sarah Adel Bargal; Andrea Zunino; Donghyun Kim; Jianming Zhang; Vittorio Murino; Stan Sclaroff
computer vision and pattern recognition | 2018
Kun He; Fatih Cakir; Sarah Adel Bargal; Stan Sclaroff
Archive | 2018
Fatih Cakir; Kun He; Sarah Adel Bargal; Stan Sclaroff