IEEE Internet of Things Journal | 2021

Video Scene Segmentation Using Tensor-Train Faster-RCNN for Multimedia IoT Systems

 
 
 
 
 
 
 

Abstract


Video surveillance techniques like scene segmentation are playing an increasingly important role in multimedia Internet-of-Things (IoT) systems. However, existing deep learning-based methods face challenges in both accuracy and memory when deployed on edge computing devices with limited computing resources. To address these challenges, a tensor-train video scene segmentation scheme that compares the local background information in regional scene boundary boxes in adjacent frames is proposed. Compared to the existing methods, the proposed scheme can achieve competitive performance in both segmentation accuracy and parameter compression rate. In detail, first, an improved faster region convolutional neural network (faster-RCNN) model is proposed to recognize and generate a large number of region boxes with foreground and background to achieve boundary boxes. Then, the foreground boxes with sparse objects are removed and the rest are considered as optional background boxes used to measure the similarity between two adjacent frames. Second, to accelerate the training efficiency and reduce memory size, a general and efficient training way using tensor-train decomposition to factor the input-to-hidden weight matrix is proposed. Finally, experiments are conducted to evaluate the performance of the proposed scheme in terms of accuracy and model compression. Our results demonstrate that the proposed model can improve the training efficiency and save the memory space for the deep computation model with good accuracy. This work opens the potential for the use of artificial intelligence methods in edge computing devices for multimedia IoT systems.

Volume 8
Pages 9697-9705
DOI 10.1109/JIOT.2020.3022353
Language English
Journal IEEE Internet of Things Journal

Full Text