Gunnar A. Sigurdsson
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gunnar A. Sigurdsson.
european conference on computer vision | 2016
Gunnar A. Sigurdsson; Gül Varol; Xiaolong Wang; Ali Farhadi; Ivan Laptev; Abhinav Gupta
Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood in Homes approach to collect such data. Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation. Following this procedure we collect a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities. The dataset is composed of 9,848 annotated videos with an average length of 30 s, showing activities of 267 people from three continents. Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects. In total, Charades provides 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes and 41,104 labels for 46 object classes. Using this rich data, we evaluate and provide baseline results for several tasks including action recognition and automatic description generation. We believe that the realism, diversity, and casual nature of this dataset will present unique challenges and new opportunities for computer vision community.
Journal of Sleep Research | 2016
Erna S. Arnardottir; Bardur Isleifsson; Jón Skírnir Ágústsson; Gunnar A. Sigurdsson; Magdalena Sigurgunnarsdottir; Gudjon T. Sigurđarson; Gudmundur Saevarsson; Atli T. Sveinbjarnarson; Sveinbjorn Hoskuldsson; Thorarinn Gislason
The objective of this study was to compare to each other the methods currently recommended by the American Academy of Sleep Medicine (AASM) to measure snoring: an acoustic sensor, a piezoelectric sensor and a nasal pressure transducer (cannula). Ten subjects reporting habitual snoring were included in the study, performed at Landspitali—University Hospital, Iceland. Snoring was assessed by listening to the air medium microphone located on a patients chest, compared to listening to two overhead air medium microphones (stereo) and manual scoring of a piezoelectric sensor and nasal cannula vibrations. The chest audio picked up the highest number of snore events of the different snore sensors. The sensitivity and positive predictive value of scoring snore events from the different sensors was compared to the chest audio: overhead audio (0.78, 0.98), cannula (0.55, 0.67) and piezoelectric sensor (0.78, 0.92), respectively. The chest audio was capable of detecting snore events with lower volume and higher fundamental frequency than the other sensors. The 200 Hz sampling rate of the cannula and piezoelectric sensor was one of their limitations for detecting snore events. The different snore sensors do not measure snore events in the same manner. This lack of consistency will affect future research on the clinical significance of snoring. Standardization of objective snore measurements is therefore needed. Based on this paper, snore measurements should be audio‐based and the use of the cannula as a snore sensor be discontinued, but the piezoelectric sensor could possibly be modified for improvement.
computer vision and pattern recognition | 2017
Gunnar A. Sigurdsson; Santosh Kumar Divvala; Ali Farhadi; Abhinav Gupta
Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. End-to-end training of such structured models is a challenging endeavor: For inference and learning we need to construct mini-batches consisting of whole videos, leading to mini-batches with only a few videos. This causes high-correlation between data points leading to breakdown of the backprop algorithm. To address this challenge, we present an asynchronous variational inference method that allows efficient end-to-end training. Our method achieves a classification mAP of 22.4% on the Charades [42] benchmark, outperforming the state-of-the-art (17.2% mAP), and offers equal gains on the task of temporal localization.
european conference on computer vision | 2016
Gunnar A. Sigurdsson; Xinlei Chen; Abhinav Gupta
What does a typical visit to Paris look like? Do people first take photos of the Louvre and then the Eiffel Tower? Can we visually model a temporal event like “Paris Vacation” using current frameworks? In this paper, we explore how we can automatically learn the temporal aspects, or storylines of visual concepts from web data. Previous attempts focus on consecutive image-to-image transitions and are unsuccessful at recovering the long-term underlying story. Our novel Skipping Recurrent Neural Network (S-RNN) model does not attempt to predict each and every data point in the sequence, like classic RNNs. Rather, S-RNN uses a framework that skips through the images in the photo stream to explore the space of all ordered subsets of the albums via an efficient sampling procedure. This approach reduces the negative impact of strong short-term correlations, and recovers the latent story more accurately. We show how our learned storylines can be used to analyze, predict, and summarize photo albums from Flickr. Our experimental results provide strong qualitative and quantitative evidence that S-RNN is significantly better than other candidate methods such as LSTMs on learning long-term correlations and recovering latent storylines. Moreover, we show how storylines can help machines better understand and summarize photo streams by inferring a brief personalized story of each individual album.
Proceedings of SPIE--the International Society for Optical Engineering | 2014
Gunnar A. Sigurdsson; Jerry L. Prince
Using modern diffusion weighted magnetic resonance imaging protocols, the orientations of multiple neuronal fiber tracts within each voxel can be estimated. Further analysis of these populations, including application of fiber tracking and tract segmentation methods, is often hindered by lack of spatial smoothness of the estimated orientations. For example, a single noisy voxel can cause a fiber tracking method to switch tracts in a simple crossing tract geometry. In this work, a generalized spatial smoothing framework that handles multiple orientations as well as their fractional contributions within each voxel is proposed. The approach estimates an optimal fuzzy correspondence of orientations and fractional contributions between voxels and smooths only between these correspondences. Avoiding a requirement to obtain exact correspondences of orientations reduces smoothing anomalies due to propagation of erroneous correspondences around noisy voxels. Phantom experiments are used to demonstrate both visual and quantitative improvements in postprocessing steps. Improvement over smoothing in the measurement domain is also demonstrated using both phantoms and in vivo human data.
Proceedings of SPIE | 2015
Gunnar A. Sigurdsson; Zhen Yang; Trac D. Tran; Jerry L. Prince
Many types of diseases manifest themselves as observable changes in the shape of the affected organs. Using shape classification, we can look for signs of disease and discover relationships between diseases. We formulate the problem of shape classification in a holistic framework that utilizes a lossless scalar field representation and a non-parametric classification based on sparse recovery. This framework generalizes over certain classes of unseen shapes while using the full information of the shape, bypassing feature extraction. The output of the method is the class whose combination of exemplars most closely approximates the shape, and furthermore, the algorithm returns the most similar exemplars along with their similarity to the shape, which makes the result simple to interpret. Our results show that the method offers accurate classification between three cerebellar diseases and controls in a database of cerebellar ataxia patients. For reproducible comparison, promising results are presented on publicly available 2D datasets, including the ETH-80 dataset where the method achieves 88.4% classification accuracy.
computer vision and pattern recognition | 2018
Gunnar A. Sigurdsson; Abhinav Gupta; Cordelia Schmid; Ali Farhadi; Karteek Alahari
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such models for human action recognition has not been achievable due to the lack of data. This paper takes a step in this direction, with the introduction of Charades-Ego, a large-scale dataset of paired first-person and third-person videos, involving 112 people, with 4000 paired videos. This enables learning the link between the two, actor and observer perspectives. Thereby, we address one of the biggest bottlenecks facing egocentric vision research, providing a link from first-person to the abundant third-person data on the web. We use this data to learn a joint representation of first and third-person videos, with only weak supervision, and show its effectiveness for transferring knowledge from the third-person to the first-person domain.
Proceedings of SPIE | 2014
Gunnar A. Sigurdsson; Jerry L. Prince
Using modern diffusion weighted magnetic resonance imaging protocols, the orientations of multiple neuronal fiber tracts within each voxel can be estimated. Further analysis of these populations, including application of fiber tracking and tract segmentation methods, is often hindered by lack of spatial smoothness of the estimated orientations. For example, a single noisy voxel can cause a fiber tracking method to switch tracts in a simple crossing tract geometry. In this work, a generalized spatial smoothing framework that handles multiple orientations as well as their fractional contributions within each voxel is proposed. The approach estimates an optimal fuzzy correspondence of orientations and fractional contributions between voxels and smooths only between these correspondences. Avoiding a requirement to obtain exact correspondences of orientations reduces smoothing anomalies due to propagation of erroneous correspondences around noisy voxels. Phantom experiments are used to demonstrate both visual and quantitative improvements in postprocessing steps. Improvement over smoothing in the measurement domain is also demonstrated using both phantoms and in vivo human data.
international conference on computer vision | 2017
Gunnar A. Sigurdsson; Olga Russakovsky; Abhinav Gupta
national conference on artificial intelligence | 2016
Gunnar A. Sigurdsson; Olga Russakovsky; Ali Farhadi; Ivan Laptev; Abhinav Gupta