Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security | 2021

On the Robustness of Backdoor-based Watermarking in Deep Neural Networks

 
 
 
 

Abstract


Watermarking algorithms have been introduced in the past years to protect deep learning models against unauthorized re-distribution. We investigate the robustness and reliability of state-of-the-art deep neural network watermarking schemes. We focus on backdoor-based watermarking and propose two simple yet effective attacks -- a black-box and a white-box -- that remove these watermarks without any labeled data from the ground truth. Our black-box attack steals the model and removes the watermark with only API access to the labels. Our white-box attack proposes an efficient watermark removal when the parameters of the marked model are accessible, and improves the time to steal a model up to twenty times over the time to train a model from scratch. We conclude that these watermarking algorithms are insufficient to defend against redistribution by a motivated attacker.

Volume None
Pages None
DOI 10.1145/3437880.3460401
Language English
Journal Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security

Full Text