Victor Escorcia
King Abdullah University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Victor Escorcia.
computer vision and pattern recognition | 2015
Fabian Caba Heilbron; Victor Escorcia; Bernard Ghanem; Juan Carlos Niebles
In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on simple actions and movements occurring on manually trimmed videos. In this paper we introduce ActivityNet, a new large-scale video benchmark for human activity understanding. Our benchmark aims at covering a wide range of complex human activities that are of interest to people in their daily living. In its current version, ActivityNet provides samples from 203 activity classes with an average of 137 untrimmed videos per class and 1.41 activity instances per video, for a total of 849 video hours. We illustrate three scenarios in which ActivityNet can be used to compare algorithms for human activity understanding: untrimmed video classification, trimmed activity classification and activity detection.
european conference on computer vision | 2016
Victor Escorcia; Fabian Caba Heilbron; Juan Carlos Niebles; Bernard Ghanem
Object proposals have contributed significantly to recent advances in object understanding in images. Inspired by the success of this approach, we introduce Deep Action Proposals (DAPs), an effective and efficient algorithm for generating temporal action proposals from long videos. We show how to take advantage of the vast capacity of deep learning models and memory cells to retrieve from untrimmed videos temporal segments, which are likely to contain actions. A comprehensive evaluation indicates that our approach outperforms previous work on a large scale action benchmark, runs at 134 FPS making it practical for large-scale scenarios, and exhibits an appealing ability to generalize, i.e. to retrieve good quality temporal proposals of actions unseen in training.
Construction Research Congress 2012: Construction Challenges in a Flat World | 2012
Victor Escorcia; María A. Dávila; Mani Golparvar-Fard; Juan Carlos Niebles
In this paper we present a novel method for reliable recognition of construction workers and their actions using color and depth data from a Microsoft Kinect sensor. Our algorithm is based on machine learning techniques, in which meaningful visual features are extracted based on the estimated body pose of workers. We adopt a bag-of-poses representation for worker actions and combine it with powerful discriminative classifiers to achieve accurate action recognition. The discriminative framework is able to focus on the visual aspects that are distinctive and can detect and recognize actions from different workers. We train and test our algorithm by using 80 videos from four workers involved in five drywall related construction activities. These videos were all collected from drywall construction activities inside of an under construction dining hall facility. The proposed algorithm is further validated by recognizing the actions of a construction worker that was never seen before in the training dataset. Experimental results show that our method achieves an average precision of 85.28 percent. The results reflect the promise of the proposed method for automated assessment of craftsmen productivity, safety, and occupational health at indoor environments.
computer vision and pattern recognition | 2017
Shyamal Buch; Victor Escorcia; Chuanqi Shen; Bernard Ghanem; Juan Carlos Niebles
Our paper presents a new approach for temporal detection of human actions in long, untrimmed video sequences. We introduce Single-Stream Temporal Action Proposals (SST), a new effective and efficient deep architecture for the generation of temporal action proposals. Our network can run continuously in a single stream over very long input video sequences, without the need to divide input into short overlapping clips or temporal windows for batch processing. We demonstrate empirically that our model outperforms the state-of-the-art on the task of temporal action proposal generation, while achieving some of the fastest processing speeds in the literature. Finally, we demonstrate that using SST proposals in conjunction with existing action classifiers results in improved state-of-the-art temporal action detection performance.
international conference on computer vision | 2013
Victor Escorcia; Juan Carlos Niebles
We introduce a new method for representing the dynamics of human-object interactions in videos. Previous algorithms tend to focus on modeling the spatial relationships between objects and actors, but ignore the evolving nature of this relationship through time. Our algorithm captures the dynamic nature of human-object interactions by modeling how these patterns evolve with respect to time. Our experiments show that encoding such temporal evolution is crucial for correctly discriminating human actions that involve similar objects and spatial human-object relationships, but only differ on the temporal aspect of the interaction, e.g. answer phone and dial phone We validate our approach on two human activity datasets and show performance improvements over competing state-of-the-art representations.
computer vision and pattern recognition | 2017
Fabian Caba Heilbron; Wayner Barrios; Victor Escorcia; Bernard Ghanem
Despite the recent advances in large-scale video analysis, action detection remains as one of the most challenging unsolved problems in computer vision. This snag is in part due to the large volume of data that needs to be analyzed to detect actions in videos. Existing approaches have mitigated the computational cost, but still, these methods lack rich high-level semantics that helps them to localize the actions quickly. In this paper, we introduce a Semantic Cascade Context (SCC) model that aims to detect action in long video sequences. By embracing semantic priors associated with human activities, SCC produces high-quality class-specific action proposals and prune unrelated activities in a cascade fashion. Experimental results in ActivityNet unveils that SCC achieves state-of-the-art performance for action detection while operating at real time.
european conference on computer vision | 2018
Humam Alwassel; Fabian Caba Heilbron; Victor Escorcia; Bernard Ghanem
Despite the recent progress in video understanding and the continuous rate of improvement in temporal action localization throughout the years, it is still unclear how far (or close?) we are to solving the problem. To this end, we introduce a new diagnostic tool to analyze the performance of temporal action detectors in videos and compare different methods beyond a single scalar metric. We exemplify the use of our tool by analyzing the performance of the top rewarded entries in the latest ActivityNet action localization challenge. Our analysis shows that the most impactful areas to work on are: strategies to better handle temporal context around the instances, improving the robustness w.r.t. the instance absolute and relative size, and strategies to reduce the localization errors. Moreover, our experimental analysis finds the lack of agreement among annotator is not a major roadblock to attain progress in the field. Our diagnostic tool is publicly available to keep fueling the minds of other researchers with additional insights about their algorithms.
computer vision and pattern recognition | 2015
Victor Escorcia; Juan Carlos Niebles; Bernard Ghanem
arXiv: Computer Vision and Pattern Recognition | 2017
Bernard Ghanem; Juan Carlos Niebles; Cees Snoek; Fabian Caba Heilbron; Humam Alwassel; Ranjay Krishna; Victor Escorcia; Kenji Hata; Shyamal Buch
british machine vision conference | 2017
Shyamal Buch; Victor Escorcia; Bernard Ghanem; Li Fei-Fei; Juan Carlos Niebles