Journal of Electronic Imaging | 2021
ADCI-Net: an adaptive discriminative clip identification strategy for fast video action recognition
Abstract
Abstract. The most common method for video-level classification is to operate models over a whole series of fixed-temporal-length clips in a long video. However, a video usually consists of several meaningless sections, and similar clips that are fed into models could lead to a suboptimal recognition accuracy and waste of computing resources. To address this issue, we introduced an adaptive discriminative clip identification network to evaluate every video clip with respect to its relevance. We adaptively choose top-ranked clips as inputs for prediction and filter out irrelevant clips. Specifically, for a given trained cumbersome convolutional neural network action recognition model, we use a lightweight hallucination network (H-Net) to study its generalization ability based on distillation. Then, the evaluation of relevant clips is performed by considering the imitated features of H-Net. Thus, heavy calculations of overlapping content or meaningless clips can be avoided during inferences. We validate our approach by examining its performance on two datasets: UCF101 and HMDB51. Our strategy can be applied to any clip-based action recognition model. Experimental results demonstrate that on UCF101, we reduced the computational cost by 70% and increased the accuracy.