Proceedings of the 29th ACM International Conference on Multimedia | 2021

Dynamic Knowledge Distillation with Cross-Modality Knowledge Transfer

 

Abstract


Supervised learning for vision tasks has achieved great success be-cause of the advances of deep learning research in many areas, such as high quality datasets, network architectures and regularization methods. In the vanilla deep learning paradigm, training a model for visual tasks is mainly based on the provided training images and annotations. Inspired by human learning with knowledge transfer where information from multiples modalities are considered, we pro-pose to improve visual tasks performance by introducing explicit knowledge extracted from other modalities. As the first step, we propose to improve image classification performance by introducing linguistic knowledge as additional constraints in model learning. This knowledge is represented as a set of constraints to be jointly utilized with visual knowledge. To coordinate the training dynamic, we propose to imbue our model the ability of dynamic distilling from multiple knowledge sources. This is done via a model agnostic knowledge weighting module which guides the learning process and updates via meta-steps during training. Preliminary experiments on various benchmark datasets validate the efficacy of our method. Our code will be made publicly available to ensure reproducibility.

Volume None
Pages None
DOI 10.1145/3474085.3481034
Language English
Journal Proceedings of the 29th ACM International Conference on Multimedia

Full Text