Embedding Watermarks into Deep Neural Networks
Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, Shin'ichi Satoh
EEmbedding Watermarks into Deep Neural Networks
Yusuke UchidaKDDI Research, Inc.Saitama, Japan Yuki NagaiKDDI Research, Inc.Saitama, Japan Shigeyuki SakazawaSaitama Research, Inc.Tokyo, JapanShin’ichi SatohNational Institute of InformaticsTokyo, Japan
Abstract
Significant progress has been made with deep neural net-works recently. Sharing trained models of deep neural net-works has been a very important in the rapid progress ofresearch and development of these systems. At the sametime, it is necessary to protect the rights to shared trainedmodels. To this end, we propose to use digital watermarkingtechnology to protect intellectual property and detect intel-lectual property infringement in the use of trained models.First, we formulate a new problem: embedding watermarksinto deep neural networks. We also define requirements,embedding situations, and attack types on watermarking indeep neural networks. Second, we propose a general frame-work for embedding a watermark in model parameters, us-ing a parameter regularizer. Our approach does not im-pair the performance of networks into which a watermarkis placed because the watermark is embedded while train-ing the host network. Finally, we perform comprehensiveexperiments to reveal the potential of watermarking deepneural networks as the basis of this new research effort. Weshow that our framework can embed a watermark duringthe training of a deep neural network from scratch, and dur-ing fine-tuning and distilling, without impairing its perfor-mance. The embedded watermark does not disappear evenafter fine-tuning or parameter pruning; the watermark re-mains complete even after 65% of parameters are pruned.
1. Introduction
Deep neural networks have recently been significantlyimproved. In particular, deep convolutional neural networks(DCNN) such as LeNet [25], AlexNet [23], VGGNet [28],GoogLeNet [29], and ResNet [16] have become de facto standards for object recognition, image classification, andretrieval. In addition, many deep learning frameworks have been released that help engineers and researchers to developsystems based on deep learning or do research with lesseffort. Examples of these great deep learning frameworksare Caffe [19], Theano [3], Torch [7], Chainer [30], Tensor-Flow [26], and Keras [5].Although these frameworks have made it easy to utilizedeep neural networks in real applications, the training ofdeep neural network models is still a difficult task because itrequires a large amount of data and time; several weeks areneeded to train a very deep ResNet with the latest GPUs onthe ImageNet dataset for instance [16]. Therefore, trainedmodels are sometimes provided on web sites to make iteasy to try a certain model or reproduce the results in re-search articles without training. For example, Model Zoo provides trained Caffe models for various tasks with usefulutility tools. Fine-tuning or transfer learning [28] is a strat-egy to directly adapt such already trained models to anotherapplication with minimum re-training time. Thus, sharingtrained models is very important in the rapid progress ofboth research and development of deep neural network sys-tems. In the future, more systematic model-sharing plat-forms may appear, by analogy with video sharing sites. Fur-thermore, some digital distribution platforms for purchaseand sale of the trained models or even artificial intelligenceskills (e.g. Alexa Skills ) may appear, similar to GooglePlay or App Store. In these situations, it is necessary toprotect the rights to shared trained models.To this end, we propose to utilize digital watermarkingtechnology, which is used to identify ownership of the copy-right of digital content such as images, audio, and videos. Inparticular, we propose a general framework to embed a wa-termark in deep neural networks models to protect intellec-tual property and detect intellectual property infringementof trained models. To the best of our knowledge, this is firstattempt to embed a watermark in a deep neural network. https://github.com/BVLC/caffe/wiki/Model-Zoo a r X i v : . [ c s . C V ] A p r he contributions of this research are three-fold, as follows:1. We formulate a new problem: embedding watermarksin deep neural networks. We also define requirements,embedding situations, and attack types for watermark-ing deep neural networks.2. We propose a general framework to embed a water-mark in model parameters, using a parameter regular-izer. Our approach does not impair the performance ofnetworks in which a watermark is embedded.3. We perform comprehensive experiments to reveal thepotential of watermarking deep neural networks.
2. Problem Formulation
Given a model network with or without trained parame-ters, we define the task of watermark embedding as to em-bed T -bit vector b ∈ { , } T into the parameters of one ormore layers of the neural network. We refer to a neural net-work in which a watermark is embedded as a host network ,and refer to the task that the host network is originally tryingto perform as the original task .In the following, we formulate (1) requirements for anembedded watermark or an embedding method, (2) embed-ding situations, and (3) expected attack types against whichembedded watermarks should be robust. Table 1 summarizes the requirements for an effective wa-termarking algorithm in an image domain [15, 8] and a neu-ral network domain. While both domains share almost thesame requirements, fidelity and robustness are different inimage and neural network domains.For fidelity in an image domain, it is essential to main-tain the perceptual quality of the host image while embed-ding a watermark. However, in a neural network domain,the parameters themselves are not important. Instead, theperformance of the original task is important. Therefore itis essential to maintain the performance of the trained hostnetwork, and not to hamper the training of a host network.Regarding robustness, as images are subject to vari-ous signal processing operations, an embedded watermarkshould stay in the host image even after these operations.And the greatest possible modification to a neural networkis fine-tuning or transfer learning [28]. An embedded wa-termark in a neural network should be detectable after fine-tuning or other possible modifications.
We classify the embedding situations into three types:train-to-embed, fine-tune-to-embed, and distill-to-embed,as summarized in Table 2.
Train-to-embed is the case in which the host network istrained from scratch while embedding a watermark wherelabels for training data are available.
Fine-tune-to-embed is the case in which a watermark isembedded while fine-tuning. In this case, model parame-ters are initialized with a pre-trained network. The networkconfiguration near the output layer may be changed beforefine-tuning.
Distill-to-embed is the case in which a watermark isembedded into a trained network without labels using thedistilling approach [17]. Embedding is performed in fine-tuning where the predictions of the trained model are usedas labels. In the standard distill framework, a large network(or multiple networks) is first trained and then a smaller net-work is trained using the predicted labels of the large net-work in order to compress the large network. In this paper,we use the distill framework simply to train a network with-out labels.The first two situations assume that the copyright holderof the host network is expected to embed a watermark tothe host network in training or fine-tuning. Fine-tune-to-embed is also useful when a model owner wants to embedindividual watermarks to identify those to whom the modelhad been distributed. By doing so, individual instances canbe tracked. The last situation assumes that a non-copyrightholder (e.g., a platformer) is entrusted to embed a water-mark on behalf of a copyright holder.
Related to the requirement for robustness in Section 2.1,we assume two types of attacks against which embeddedwatermarks should be robust: fine-tuning and model com-pression. These types of attack are very specific to deepneural networks, while one can easily imagine model com-pression by analogy with lossy image compression in theimage domain.
Fine-tuning or transfer learning [28] seems to be themost feasible type of attack, because it reduces the burdenof training deep neural networks. Many models have beenconstructed on top of existing state-of-the-art models. Fine-tuning alters the model parameters, and thus embedded wa-termarks should be robust against this alteration.
Model compression is very important in deploying deepneural networks to embedded systems or mobile devicesas it can significantly reduce memory requirements and/orcomputational cost. Lossy compression distorts model pa-rameters, so we should explore how it affects the detectionrate.
3. Proposed Framework
In this section, we propose a framework for embeddinga watermark into a host network. Although we focus on a2 able 1. Requirements for an effective watermarking algorithm in the image and neural network domains.
Image domain Neural networks domainFidelity The quality of the host image should not bedegraded by embedding a watermark. The effectiveness of the host network shouldnot be degraded by embedding a watermark.Robustness The embedded watermark should be robustagainst common signal processing operationssuch as lossy compression, cropping, resiz-ing, and so on. The embedded watermark should be robustagainst model modifications such as fine-tuning and model compression.Capacity The effective watermarking system must have the ability to embed a large amount of in-formation.Security A watermark should in general be secret and should not be accessed, read, or modified byunauthorized parties.Efficiency The watermark embedding and extraction processes should be fast.
Table 2. Three embedding situations. Fine-tune indicates whetherparameters are initialized in embedding using already trained mod-els, or not. Label availability indicates whether or labels for train-ing data are available in embedding.
Fine-tune Label availabilityTrain-to-embed (cid:88)
Fine-tune-to-embed (cid:88) (cid:88)
Distill-to-embed (cid:88)
DCNN [25] as the host, our framework is essentially ap-plicable to other networks such as standard multilayer per-ceptron (MLP), recurrent neural network (RNN), and longshort-term memory (LSTM) [18].
In this paper, a watermark is assumed to be embeddedinto one of the convolutional layers in a host DCNN . Let ( S, S ) , D , and L respectively denote the size of the con-volution filter, the depth of input to the convolutional layer,and the number of filters in the convolutional layer. The pa-rameters of this convolutional layer are characterized by thetensor W ∈ R S × S × D × L . The bias term is ignored here.Let us think of embedding a T -bit vector b ∈ { , } T into W . The tensor W is a set of L convolutional filters and theorder of the filters does not affect the output of the networkif the parameters of the subsequent layers are appropriatelyre-ordered. In order to remove this arbitrariness in the or-der of filters, we calculate the mean of W over L filters as W ijk = L (cid:80) l W ijkl . Letting w ∈ R M ( M = S × S × D )denote a flattened version of W , our objective is now to em-bed T -bit vector b into w . It is possible to embed a watermark in a host networkby directly modifying w of a trained network, as is usu- Fully-connected layers can also be used but we focus on convolu-tional layers here, because fully-connected layers are often discarded infine-tuning. ally done in the image domain. However, this approach de-grades the performance of the host network in the originaltask as shown later in Section 4.2.6. Instead, we proposeembedding a watermark while training a host network forthe original task so that the existence of the watermark doesnot impair the performance of the host network in its orig-inal task. To this end, we utilize a parameter regularizer ,which is an additional term in the original cost function forthe original task. The cost function E ( w ) with a regularizeris defined as: E ( w ) = E ( w ) + λE R ( w ) , (1)where E ( w ) is the original cost function, E R ( w ) is a regu-larization term that imposes a certain restriction on param-eters w , and λ is an adjustable parameter. A regularizer isusually used to prevent the parameters from growing toolarge. L regularization (or weight decay [24]), L regu-larization, and their combination are often used to reduceover-fitting of parameters for complex neural networks. Forinstance, E R ( w ) = || w || in the L regularization.In contrast to these standard regularizers, our regularizerimposes parameter w to have a certain statistical bias, as awatermark in a training process. We refer to this regular-izer as an embedding regularizer . Before defining the em-bedding regularizer, we explain how to extract a watermarkfrom w . Given a (mean) parameter vector w ∈ R M and anembedding parameter X ∈ R T × M , the watermark extrac-tion is simply done by projecting w using X , followed bythresholding at 0. More precisely, the j -th bit is extractedas: b j = s (Σ i X ji w i ) , (2)where s ( x ) is a step function: s ( x ) = (cid:40) x ≥
00 else . (3)This process can be considered to be a binary classifica-3ion problem with a single-layer perceptron (without bias) .Therefore, it is straightforward to define the loss function E R ( w ) for the embedding regularizer by using binary crossentropy: E R ( w ) = − T (cid:88) j =1 ( b j log( y j ) + (1 − b j ) log(1 − y j )) , (4)where y j = σ (Σ i X ji w i ) and σ ( · ) is the sigmoid function: σ ( x ) = 11 + exp( − x ) . (5)We call this loss function an embedding loss function. Itmay be confusing that an embedding loss function is usedto update w , not X , in our framework. In a standard percep-tron, w is an input and X is a parameter to be learned. In ourcase, w is an embedding target and X is a fixed parameter.The design of X is discussed in the following section.This approach does not impair the performance of thehost network in the original task as confirmed in exper-iments, because deep neural networks are typically over-parameterized. It is well-known that deep neural networkshave many local minima, and that all local minima arelikely to have an error very close to that of the global min-imum [9, 6]. Therefore, the embedding regularizer onlyneeds to guide model parameters to one of a number of good local minima so that the final model parameters have an ar-bitrary watermark. In this section we discuss the design of the embed-ding parameter X , which can be considered as a secretkey [15] in detecting and embedding watermarks. While X ∈ R T × M can be an arbitrary matrix, it will affect theperformance of an embedded watermark because it is usedin both embedding and extraction of watermarks. In thispaper, we consider three types of X : X direct , X diff , and X random . X direct is constructed so that one element in each row of X direct is ’1’ and the others are ’0’. In this case, the j -thbit b j is directly embedded in a certain parameter w ˆ i s.t. X direct j ˆ i = 1 . X diff is created so that each row has one ’1’ element andone ’-1’ element, and the others are ’0’. Using X diff , the j -th bit b j is embedded into the difference between w i + and w i − where X diff ji + = 1 and X diff ji − = − .Each element of X random is independently drawn fromthe standard normal distribution N (0 , . Using X random ,each bit is embedded into all instances of the parameter w Although this single-layer perceptron can be deepened into multi-layer perceptron, we focus on the simplest one in this paper. with random weights. These three types of embedding pa-rameters are compared in experiments.Our implementation of the embedding regularizer is pub-licly available from https://github.com/yu4u/dnn-watermark .
4. Experiments
In this section, we demonstrate that our embedding reg-ularizer can embed a watermark without impairing the per-formance of the host network, and the embedded watermarkis robust against various types of attack.
Datasets.
For experiments, we used the well-knownCIFAR-10 and Caltech-101 datasets. The CIFAR-10dataset [22] consists of 60,000 × color images in 10classes, with 6,000 images per class. These images wereseparated into 50,000 training images and 10,000 test im-ages. The Caltech-101 dataset [10] includes pictures of ob-jects belonging to 101 categories; it contains about 40 to800 images per category. The size of each image is roughly × pixels but we resized them in × for fine-tuning. For each category, we used 30 images for trainingand at most 40 of the remaining images for testing. Host network and training settings.
We used thewide residual network [33] as the host network. The wideresidual network is an efficient variant of the residual net-work [16]. Table 3 shows the structure of the wide residualnetwork with a depth parameter N and a width parameter k . In all our experiments, we set N = 1 and k = 4 , andused SGD with Nesterov momentum and cross-entropy lossin training. The initial learning rate was set at 0.1, weightdecay to . × − , momentum to 0.9 and minibatch sizeto 64. The learning rate was dropped by a factor of 0.2 at60, 120 and 160 epochs, and we trained for a total of 200epochs, following the settings used in [33].We embedded a watermark into one of the followingconvolution layers: the second convolution layer in the conv 2 , conv 3 , and conv 4 groups. In the following, wemention the location of the host layer by simply describingthe conv 2 , conv 3 , or conv 4 group. In Table 3, the num-ber M of parameter w is also shown for these layers. Theparameter λ in Eq. (1) is set to . . As a watermark, weembedded b = ∈ { , } T in the following experiments. First, we confirm that a watermark was successfully em-bedded in the host network by the proposed embedding reg-ularizer. We trained the host network from scratch ( train-to-embed ) on the CIFAR-10 dataset with and without embed-ding a watermark. In the embedding case, a 256-bit water-mark ( T = 256 ) was embedded into the conv 2 group.4 able 3. Structure of the host network. Group Output size Building block w conv 1 ×
32 [3 × , N/Aconv 2 × (cid:20) × , × k × , × k (cid:21) × N × k conv 3 × (cid:20) × , × k × , × k (cid:21) × N × k conv 4 × (cid:20) × , × k × , × k (cid:21) × N × k × avg-pool, fc, soft-max N/A T e s t e rr o r Not embeddedEmbedded (direct)Embedded (diff)Embedded (random) -2 -1 T r a i n i n g l o ss Figure 1. Training curves for the host network on CIFAR-10 as afunction of epochs. Solid lines denote test error (y-axis on the left)and dashed lines denote training loss E ( w ) (y-axis on the right).Table 4. Test error ( % ) and embedding loss E R ( w ) with and with-out embedding. Test error E R ( w ) Not embedded 8.04 N/Adirect 8.21 . × − diff 8.37 . × − random 7.97 . × − Figure 1 shows the training curves for the host network inCIFAR-10 as a function of epochs.
Not embedded is thecase that the host network is trained without the embed-ding regularizer.
Embedded (direct) , Embedded (diff) ,and
Embedded (random) respectively represent trainingcurves with embedding regularizers whose parameters are X direct , X diff , and X random . We can see that the trainingloss E ( w ) with a watermark becomes larger than the not-embedded case if the parameters X direct and X diff are used.This large training loss is dominated by the embedding loss E R ( w ) , which indicates that it is difficult to embed a water-mark directly into a parameter or even into the difference oftwo parameters. On the other hand, the training loss of Em-bedded (random) is very close to that of
Not embedded . Table 4 shows the best test errors and embedding losses E R ( w ) of the host networks with and without embedding.We can see that the test errors of Not embedded and ran-dom are almost the same while those of direct and diff are slightly larger. The embedding loss E R ( w ) of ran-dom is extremely low compared with those of direct and diff . These results indicate that the random approach caneffectively embed a watermark without impairing the per-formance in the original task. Figure 2 shows the histogram of the embedded watermark σ (Σ i X ji w i ) (before thresholding) with and without wa-termarks where (a) direct , (b) diff , and (c) random pa-rameters are used in embedding and detection. If we bi-narize σ (Σ i X ji w i ) at a threshold of 0.5, all watermarksare correctly detected because ∀ j, σ (Σ i X ji w i ) ≥ . ( ⇔ Σ i X ji w i ≥ ) for all embedded cases. Please notethat we embedded b = ∈ { , } T as a watermark as men-tioned before. Although random watermarks will be de-tected for the non-embedded cases, it can be easily judgedthat the watermark is not embedded because the distributionof σ (Σ i X ji w i ) is quite different from those for embeddedcases. We explore how trained model parameters are affected bythe embedded watermarks. Figure 3 shows the distributionof model parameters W (not w ) with and without water-marks. These parameters are taken only from the layer inwhich a watermark was embedded. Note that W is the pa-rameter before taking the mean over filters, and thus thenumber of parameters is × × × . We can see that direct and diff significantly alter the distribution of param-eters while random does not. In direct , many parametersbecame large and a peak appears near 2 so that their meanover filters becomes a large positive value to reduce the em-bedding loss. In diff , most parameters were pushed in bothpositive and negative directions so that the differences be-tween these parameters became large. In random , a water-mark is diffused over all parameters with random weightsand thus does not significantly alter the distribution. This isone of the desirable properties of watermarking related thesecurity requirement; one may be aware of the existence ofthe embedded watermarks for the direct and diff cases.The results so far indicated that the random approachseemed to be the best choice among the three, with low em-bedding loss, low test error in the original task, and not al-tering the parameter distribution. Therefore, in the follow-ing experiments, we used the random approach in embed-ding watermarks without explicitly indicating it.5 .0 0.2 0.4 0.6 0.8 1.0050100150200250 F r e q u e n c y Not embeddedEmbedded (direct) (a) direct F r e q u e n c y Not embeddedEmbedded (diff) (b) diff F r e q u e n c y Not embeddedEmbedded (random) (c) random
Figure 2. Histogram of the embedded watermark σ (Σ i X ji w i ) (before thresholding) with and without watermarks. F r e q u e n c y value (a) Not embedded F r e q u e n c y value (b) direct F r e q u e n c y value (c) diff F r e q u e n c y value (d) random Figure 3. Distribution of model parameters W with and without watermarks. T e s t e rr o r Not embedded 1stNot embedded 2ndEmbedded -2 -1 T r a i n i n g l o ss Figure 4. Training curves for fine-tuning the host network. Thefirst and second halves of epochs correspond to the first and sec-ond training. Solid lines denote test error (y-axis on the left) anddashed lines denote training loss (y-axis on the right).
In the above experiments, a watermark was embedded bytraining the host network from scratch (train-to-embed).Here, we evaluated the other two situations introduced inSection 2.2: fine-tune-to-embed and distill-to-embed. Forfine-tune-to-embed, two experiments were performed. Inthe first experiment, the host network was trained on theCIFAR-10 dataset without embedding, and then fine-tuned on the same CIFAR-10 dataset with embedding and with-out embedding (for comparison). In the second experiment,the host network is trained on the Caltech-101 dataset, andthen fine-tuned on the CIFAR-10 dataset with and withoutembedding.Table 5 (a) shows the result of the first experiment.
Notembedded 1st corresponds to the first training without em-bedding.
Not embedded 2nd corresponds to the secondtraining without embedding and
Embedded correspondsto the second training with embedding. Figure 4 shows thetraining curves of these fine-tunings . We can see that Em-bedded achieved almost the same test error as
Not embed-ded 2nd and a very low E R ( w ) .Table 5 (b) shows the results of the second experiment. Not embedded 2nd corresponds to the second trainingwithout embedding and
Embedded corresponds to the sec-ond training with embedding. The test error and trainingloss of the first training are not shown because they are notcompatible between these two different training datasets.From these results, it was also confirmed that
Embedded achieved almost the same test error as
Not embedded 2nd and very low E R ( w ) . Thus, we can say that the proposedmethod is effective even in the fine-tune-to-embed situation(in the same and different domains).Finally, embedding a watermark in the distill-to-embed Note that the learning rate was also initialized to 0.1 at the beginningof the second training, while the learning rate was reduced to . × − )at the end of the first training. able 5. Test error ( % ) and embedding loss E R ( w ) with and with-out embedding in fine-tuning and distilling. (a) Fine-tune-to-embed (CIFAR-10 → CIFAR-10)Test error E R ( w ) Not embedded 1st 8.04 N/ANot embedded 2nd 7.66 N/AEmbedded 7.70 . × − (b) Fine-tune-to-embed (Caltech-101 → CIFAR-10)Test error E R ( w ) Not embedded 2nd 7.93 N/AEmbedded 7.94 . × − (c) Distill-to-embed (CIFAR-10 → CIFAR-10)Test error E R ( w ) Not embedded 1st 8.04 N/ANot embedded 2nd 7.86 N/AEmbedded 7.75 . × − situation was evaluated. The host network is first trainedon the CIFAR-10 dataset without embedding. Then, thetrained network was further fine-tuned on the same CIFAR-10 dataset with and without embedding. In this secondtraining, the training labels of the CIFAR-10 dataset were not used. Instead, the predicted values of the trained net-work were used as soft targets [17]. In other words, no labelwas used in the second training. Table 5 (c) shows the re-sults for the distill-to-embed situation. Not embedded 1st corresponds to the first training and
Embedded ( Not em-bedded 2nd ) corresponds to the second distilling trainingwith embedding (without embedding). It was found that theproposed method also achieved low test error and E R ( w ) inthe distill-to-embed situation. In this section, the capacity of the embedded watermark isexplored by embedding different sizes of watermarks intodifferent groups in the train-to-embed manner. Please notethat the number of parameters w of conv 2 , conv 3 , and conv 4 groups were 576, 1152, and 2304, respectively. Ta-ble 6 shows test error ( % ) and embedding loss E R ( w ) forcombinations of different embedded blocks and differentnumber of embedded bits. We can see that embedded loss ortest error becomes high if the number of embedded bits be-comes larger than the number of parameters w (e.g. 2,048bits in conv 3 ) because the embedding problem becomesoverdetermined in such cases. Thus, the number of embed-ded bits should be smaller than the number of parameters w , which is a limitation of the embedding method using asingle-layer perceptron. This limitation would be resolvedby using a multi-layer perceptron in the embedding regular-izer. Table 6. Test error ( % ) and embedding loss E R ( w ) for the combi-nations of embedded groups and sizes of embedded bits. (a) Test error ( % )Embedded bits Embedded groupconv 2 conv 3 conv 4256 7.97 7.98 7.92512 8.47 8.22 7.841,024 8.43 8.12 7.842,048 8.17 8.93 7.75(b) Embedding lossEmbedded bits Embedded groupconv 2 conv 3 conv 4256 . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − Table 7. Losses, test error, and bit error rate (BER) after embed-ding a watermark with different λ . λ || w − w || E R ( w ) Test error BER0 0.000 1.066 8.04 0.5311 0.184 0.609 8.52 0.32410 1.652 0.171 10.57 0.000100 7.989 0.029 13.00 0.000
As mentioned in Section 3.2, it is possible to embed a wa-termark to a host network by directly modifying the trainedparameter w as usually done in image domain. Here we tryto do this by minimizing the following loss function insteadof Eq. (1): E ( w ) = || w − w || + λE R ( w ) , (6)where the embedding loss E R ( w ) is minimized while min-imizing the difference between the modified parameter w and the original parameter w . Table 7 summarizes the em-bedding results after Eq. (6) against the host network trainedon the CIFAR-10 dataset. We can see that embedding failsfor λ ≤ as bit error rate (BER) is larger than zero while thetest error of the original task becomes too large for λ > .Thus, it is not effective to directly embed a watermark with-out considering the original task. In this section, the robustness of a proposed watermark isevaluated for the three attack types explained in Section 2.3:fine-tuning and model compression.
Fine-tuning or transfer learning [28] seems to be the mostlikely type of (unintentional) attack because it is frequently7 able 8. Embedding loss before fine-tuning ( E R ( w ) ) and afterfine-tuning ( E (cid:48) R ( w ) ), and the best test error ( % ) after fine-tuning. E R ( w ) E (cid:48) R ( w ) Test errorCIFAR-10 → CIFAR-10 . × − . × − → CIFAR-10 . × − . × − performed on trained models to apply them to other butsimilar tasks with less effort than training a network fromscratch or to avoid over-fitting when sufficient training datais not available.In this experiment, two trainings we performed; in thefirst training, a 256-bit watermark was embedded in the conv 2 group in the train-to-embed manner, and then thehost network was further fine-tuned in the second trainingwithout embedding, to determine whether the watermarkembedded in the first training stayed in the host networkor not, even after the second training (fine-tuning).Table 8 shows the embedding loss before fine-tuning( E R ( w ) ) and after fine-tuning ( E (cid:48) R ( w ) ), and the best test er-ror after fine-tuning. We evaluated fine-tuning in the samedomain (CIFAR-10 → CIFAR-10) and in different domains(Caltech-101 → CIFAR-10). We can see that, in both cases,the embedding loss was slightly increased by fine-tuningbut was still low. In addition, the bit error rate of the de-tected watermark was equal to zero in both cases. The rea-son why the embedding loss in fine-tuning in the differentdomains is higher than that in the same domain is that theCaltech-101 dataset is significantly more difficult than theCIFAR-10 dataset in our settings; all images in the Caltech-101 dataset were resized to × for compatibility withthe CIFAR-10 dataset. It is sometimes difficult to deploy deep neural networks toembedded systems or mobile devices because they are bothcomputationally intensive and memory intensive. In orderto solve this problem, the model parameters are often com-pressed [14, 12, 13]. The compression of model parameterscan intentionally or unintentionally act as an attack againstwatermarks. In this section, we evaluate the robustness ofour watermarks against model compression, in particular,against parameter pruning [14]. In parameter pruning, pa-rameters whose absolute values are very small are cut-off tozero. In [13], quantization of weights and the Huffman cod-ing of quantized values are further applied. Because quanti-zation has less impact than parameter pruning and the Huff-man coding is lossless compression, we focus on parameterpruning.In order to evaluate the robustness against parameterpruning, we embedded a 256-bit watermark in the conv This size is extremely small compared with their original sizes(roughly × ). group while training the host network on the CIFAR-10dataset. We removed α % of the × × × parametersof the embedded layer and calculated embedding loss andbit error rate. Figure 5 (a) shows embedding loss E R ( w ) as a function of pruning rate α . Ascending ( Descending )represents embedding loss when the top α % parameters arecut-off according to their absolute values in ascending (de-scending) order. Random represents embedding loss where α % of parameters are randomly removed. Ascending cor-responds to parameter pruning and the others were evalu-ated for comparison. We can see that the embedding loss of
Ascending increases more slowly than those of
Descend-ing and
Random as α increases. It is reasonable that modelparameters with small absolute values have less impact ona detected watermark because the watermark is extractedfrom the dot product of the model parameter w and the con-stant embedding parameter (weight) X .Figure 5 (b) shows the bit error rate as a function of prun-ing rate α . Surprisingly, the bit error rate was still zero afterremoving 65% of the parameters and / even after 80%of the parameters were pruned ( Ascending ). We can saythat the embedded watermark is sufficiently robust againstparameter pruning because, in [13], the resulting pruningrate of convolutional layers ranged from to 16% to 65% forthe AlexNet [23], and from 42% to 78% for VGGNet [28].Furthermore, this degree of bit error can be easily correctedby an error correction code (e.g. the BCH code). Figure 6shows the histogram of the detected watermark σ (Σ i X ji w i ) after pruning for α = 0 . and . . For α = 0 . , the his-togram of the detected watermark is also shown for the hostnetwork into which no watermark is embedded. We can seethat many of σ (Σ i X ji w i ) are still close to one for the em-bedded case, which might be used as a confidence score injudging the existence of a watermark (zero-bit watermark-ing).
5. Conclusions and Future Work
In this paper, we have proposed a general frameworkfor embedding a watermark in deep neural network modelsto protect the rights to the trained models. First, we for-mulated a new problem: embedding watermarks into deepneural networks. We also defined requirements, embeddingsituations, and attack types for watermarking deep neuralnetworks. Second, we proposed a general framework forembedding a watermark in model parameters using a pa-rameter regularizer. Our approach does not impair the per-formance of networks into which a watermark is embed-ded. Finally, we performed comprehensive experiments toreveal the potential of watermarking deep neural networksas the basis of this new problem. We showed that our frame-work could embed a watermark without impairing the per-formance of a deep neural network. The embedded water-mark did not disappear even after fine-tuning or parameter8 .0 0.2 0.4 0.6 0.8 1.0Pruning rate10 -4 -3 -2 -1 E m b e dd i n g l o ss AscendingDescendingRandom (a) Embedding loss. B i t e rr o r r a t e AscendingDescendingRandom (b) Bit error rate.
Figure 5. Embedding loss and bit error rate after pruning as a func-tion of pruning rate. pruning; the entire watermark remained even after 65% ofthe parameters were pruned.
Although we have obtained first insight into the newproblem of embedding a watermark in deep neural net-works, many things remain as future work.
Watermark overwriting.
A third-party user may em-bed a different watermark in order to overwrite the origi-nal watermark. In our preliminary experiments, this wa-termark overwriting caused 30.9%, 8.6%, and 0.4% bit er-rors against watermarks in the conv 2 , conv 3 , and conv 4 groups when 256-bit watermarks were additionally embed-ded. More robust watermarking against overwriting shouldbe explored (e.g. non-linear embedding). Compression as embedding.
Compressing deep neu-ral networks is a very important and active research topic.While we confirmed that our watermark is very robust F r e q u e n c y Not embedded α = 0 . Embedded α = 0 . Embedded α = 0 . Figure 6. Histogram of the detected watermark σ (Σ i X ji w i ) afterpruning. against parameter pruning in this paper, a watermark mightbe embedded in conjunction with compressing models. Forexample, in [13], after parameter pruning, the network isre-trained to learn the final weights for the remaining sparseparameters. Our embedding regularizer can be used in thisre-training to embed a watermark. Network morphism.
In [4, 32], a systematic study hasbeen done on how to morph a well-trained neural networkinto a new one so that its network function can be com-pletely preserved for further training. This network mor-phism can constitute a severe attack against our watermarkbecause it may be impossible to detect the embedded water-mark if the topology of the host network is severely modi-fied. We left the investigation how the embedded watermarkis affected by this network morphism for future work.
Steganalysis.
Steganalysis [27, 21] is a method for de-tecting the presence of secretly hidden data (e.g. steganog-raphy or watermarks) in digital media files such as images,video, audio, and, in our case, deep neural networks. Water-marks ideally are robust against steganalysis. While, in thispaper, we confirmed that embedding watermarks does notsignificantly change the distribution of model parameters,more exploration is needed to evaluate robustness againststeganalysis. Conversely, developing effective steganalysisagainst watermarks for deep neural networks can be an in-teresting research topic.
Fingerprinting.
Digital fingerprinting is an alternativeto the watermarking approach for persistent identificationof images [2], video [20, 31], and audio clips [1, 11]. Inthis paper, we focused on one of these two important ap-proaches. Robust fingerprinting of deep neural networks isanother and complementary direction to protect deep neuralnetwork models.9 eferences [1] X. Anguera, A. Garzon, and T. Adamek. Mask: Robust localfeatures for audio fingerprinting. In
Proc. of ICME , 2012.[2] J. Barr, B. Bradley, and B. T. Hannigan. Using digital wa-termarks with image signatures to mitigate the threat of thecopy attack. In
Proc. of ICASSP , pages 69–72, 2003.[3] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu,G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio.Theano: a CPU and GPU math expression compiler. In
Proc.of the Python for Scientific Computing Conference (SciPy) ,2010.[4] T. Chen, I. Goodfellow, and J. Shlens. Net2net: Acceleratinglearning via knowledge transfer. In
Proc. of ICLR , 2016.[5] F. Chollet. Keras.
GitHub repository , 2015.[6] A. Choromanska, M. Henaff, M. Mathieu, G. Arous, andY. LeCun. The loss surfaces of multilayer networks. In
Proc.of AISTATS , 2015.[7] R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: Amatlab-like environment for machine learning. In
Proc. ofNIPS Workshop on BigLearn , 2011.[8] I. Cox, M. Miller, J. Bloom, J. Fridrich, and T. Kalker.
Dig-ital Watermarking and Steganography . Morgan KaufmannPublishers Inc., 2 edition, 2008.[9] Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli,and Y. Bengio. Identifying and attacking the saddle pointproblem in high-dimensional non-convex optimization. In
Proc. of NIPS , 2014.[10] L. Fei-Fei, R. Fergus, and P. Perona. Learning generativevisual models from few training examples: an incremen-tal bayesian approach tested on 101 object categories. In
Proc. of CVPR Workshop on Generative-Model Based Vi-sion , 2004.[11] J. Haitsma and T. Kalker. A highly robust audio fingerprint-ing system. In
Proc. of ISMIR , pages 107–115, 2002.[12] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz,and W. J. Dally. Eie: Efficient inference engine on com-pressed deep neural network. In
Proc. of ISCA , 2016.[13] S. Han, H. Mao, and W. J. Dally. Deep compression: Com-pressing deep neural networks with pruning, trained quanti-zation and huffman coding. In
Proc. of ICLR , 2016.[14] S. Han, J. Pool, J. Tran, and W. J. Dally. Learning bothweights and connections for efficient neural networks. In
Proc. of NIPS , 2015.[15] F. Hartung and M. Kutter. Multimedia watermarking tech-niques.
Proceedings of the IEEE , 87(7):1079–1107, 1999.[16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learningfor image recognition. In
Proc. of CVPR , 2016.[17] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledgein a neural network. In
Proc. of NIPS Workshop on DeepLearning and Representation Learning , 2014.[18] S. Hochreiter and J. Schmidhuber. Long short-term memory.
Neural Computation , 9(8):1735–1780, 1997.[19] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir-shick, S. Guadarrama, and T. Darrell. Caffe: Convolutionalarchitecture for fast feature embedding. In
Proc. of MM ,2014. [20] A. Joly, C. Frelicot, and O. Buisson. Content-based videocopy detection in large databases: a local fingerprints sta-tistical similarity search approach. In
Proc. of ICIP , pages505–508, 2005.[21] J. Kodovsky, J. Fridrich, and V. Holub. Ensemble classifiersfor steganalysis of digital media.
IEEE Trans. on InformationForensics and Security , 7(2):432–444, 2012.[22] A. Krizhevsky. Learning multiple layers of features fromtiny images.
Tech Report , 2009.[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenetclassification with deep convolutional neural networks. In
Proc. of NIPS , 2012.[24] A. Krogh and J. A. Hertz. A simple weight decay can im-prove generalization. In
Proc. of NIPS , 1992.[25] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.
Proceed-ings of the IEEE , 86(11):2278–2324, 1998.[26] M. Abadi, et al.
Tensorflow: Large-scale machine learningon heterogeneous distributed systems. arXiv:1603.04467 ,2016.[27] L. Shaohui, Y. Hongxun, and G. Wen. Neural network basedsteganalysis in still images. In
Proc. of ICME , 2003.[28] K. Simonyan and A. Zisserman. Very deep convolutionalnetworks for large-scale image recognition. In
Proc. of ICLR ,2015.[29] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.Going deeper with convolutions. In
Proc. of CVPR , 2015.[30] S. Tokui, K. Oono, S. Hido, and J. Clayton. Chainer: anext-generation open source framework for deep learning.In
Proc. of NIPS Workshop on Machine Learning Systems ,2015.[31] Y. Uchida, M. Agrawal, and S. Sakazawa. Accurate content-based video copy detection with efficient feature indexing.In
Proc. of ICMR , 2011.[32] T. Wei, C. Wang, Y. Rui, and C. W. Chen. Network mor-phism. In
Proc. of ICML , 2016.[33] S. Zagoruyko and N. Komodakis. Wide residual networks.In
Proc. of ECCV , 2016., 2016.