Jiasen Lu
Virginia Tech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jiasen Lu.
international conference on computer vision | 2015
Stanislaw Antol; Aishwarya Agrawal; Jiasen Lu; Margaret Mitchell; Dhruv Batra; C. Lawrence Zitnick; Devi Parikh
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines for VQA are provided and compared with human performance.
computer vision and pattern recognition | 2017
Jiasen Lu; Caiming Xiong; Devi Parikh; Richard Socher
Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as the and of. Other words that may seem visual can often be predicted reliably just from the language model e.g., sign after behind a red stop or phone following talking on a cell. In this paper, we propose a novel adaptive attention model with a visual sentinel. At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin.
computer vision and pattern recognition | 2015
Jiasen Lu; Ran Xu; Jason J. Corso
Detailed analysis of human action, such as action classification, detection and localization has received increasing attention from the community; datasets like JHMDB have made it plausible to conduct studies analyzing the impact that such deeper information has on the greater action understanding problem. However, detailed automatic segmentation of human action has comparatively been unexplored. In this paper, we take a step in that direction and propose a hierarchical MRF model to bridge low-level video fragments with high-level human motion and appearance; novel higher-order potentials connect different levels of the supervoxel hierarchy to enforce the consistency of the human segmentation by pulling from different segment-scales. Our single layer model significantly outperforms the current state-of-the-art on actionness, and our full model improves upon the single layer baselines in action segmentation.
neural information processing systems | 2016
Jiasen Lu; Jianwei Yang; Dhruv Batra; Devi Parikh
empirical methods in natural language processing | 2017
Alexander H. Miller; Will Feng; Dhruv Batra; Antoine Bordes; Adam Fisch; Jiasen Lu; Devi Parikh; Jason Weston
neural information processing systems | 2017
Jiasen Lu; Anitha Kannan; Jianwei Yang; Devi Parikh; Dhruv Batra
arXiv: Computer Vision and Pattern Recognition | 2016
Jiasen Lu; Jianwei Yang; Dhruv Batra; Devi Parikh
computer vision and pattern recognition | 2018
Jiasen Lu; Jianwei Yang; Dhruv Batra; Devi Parikh
arXiv: Robotics | 2018
Jianwei Yang; Jiasen Lu; Stefan Lee; Dhruv Batra; Devi Parikh
arXiv: Computer Vision and Pattern Recognition | 2018
Jianwei Yang; Jiasen Lu; Stefan Lee; Dhruv Batra; Devi Parikh