Proceedings of the 29th ACM International Conference on Multimedia | 2021
Better Learning Shot Boundary Detection via Multi-task
Abstract
Shot boundary detection (SBD) plays an important role in video understanding, since most recent works take the shot as minimal granularity instead of frames for upstream tasks. However, the large variations of hard-cut and gradual-change transitions within shots significantly limit the performance of SBD. To deal with the variations, we propose a multi-task architecture called Transnet++. Transnet++ disentangles the two types of transition and adopts two separate branches to predict them respectively. Two branches share the same video knowledge space and their results are fused for final prediction. Moreover, we propose a spatial attention module (SAM) to enhance the feature representations which suffers from redundant padding region. Meanwhile, a temporal attention module (TAM) is applied to capture the long-term information of the video for alleviating the over-segmentation problem. Experimental results (91.16% f1-score) on Tencent AVS Dataset demonstrate the effectiveness and superiority of Transnet++ for SBD.