2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) | 2019
SUN-Spot: An RGB-D Dataset With Spatial Referring Expressions
Abstract
We introduce a new dataset, SUN-Spot, for localizing objects using spatial referring expressions (REs). SUN-Spot is the only RE dataset which uses RGB-D images. It also contains a greater average number of spatial prepositions and more cluttered scenes than previous RE datasets. Using a simple baseline, we show that including a depth channel in RE models can improve performance on both generation and comprehension.