[PDF] An Automated and Robust Image Watermarking Scheme Based on Deep Neural Networks

Abstract

Digital image watermarking is the process of embedding and extracting a watermark covertly on a cover-image. To dynamically adapt image watermarking algorithms, deep learning-based image watermarking schemes have attracted increased attention during recent years. However, existing deep learning-based watermarking methods neither fully apply the fitting ability to learn and automate the embedding and extracting algorithms, nor achieve the properties of robustness and blindness simultaneously. In this paper, a robust and blind image watermarking scheme based on deep learning neural networks is proposed. To minimize the requirement of domain knowledge, the fitting ability of deep neural networks is exploited to learn and generalize an automated image watermarking algorithm. A deep learning architecture is specially designed for image watermarking tasks, which will be trained in an unsupervised manner to avoid human intervention and annotation. To facilitate flexible applications, the robustness of the proposed scheme is achieved without requiring any prior knowledge or adversarial examples of possible attacks. A challenging case of watermark extraction from phone camera-captured images demonstrates the robustness and practicality of the proposal. The experiments, evaluation, and application cases confirm the superiority of the proposed scheme.

Full PDF

11 An Automated and Robust Image WatermarkingScheme Based on Deep Neural Networks

Xin Zhong, Pei-Chi Huang, Spyridon Mastorakis, Frank Y. Shih,

Senior Member, IEEE

Abstract —Digital image watermarking is the process of em-bedding and extracting a watermark covertly on a cover-image.To dynamically adapt image watermarking algorithms, deeplearning–based image watermarking schemes have attracted in-creased attention during recent years. However, existing deeplearning–based watermarking methods neither fully apply theﬁtting ability to learn and automate the embedding and extractingalgorithms, nor achieve the properties of robustness and blind-ness simultaneously. In this paper, a robust and blind imagewatermarking scheme based on deep learning neural networksis proposed. To minimize the requirement of domain knowledge,the ﬁtting ability of deep neural networks is exploited to learnand generalize an automated image watermarking algorithm.A deep learning architecture is specially designed for imagewatermarking tasks, which will be trained in an unsupervisedmanner to avoid human intervention and annotation. To facilitateﬂexible applications, the robustness of the proposed scheme isachieved without requiring any prior knowledge or adversarialexamples of possible attacks. A challenging case of watermarkextraction from phone camera–captured images demonstrates therobustness and practicality of the proposal. The experiments,evaluation, and application cases conﬁrm the superiority of theproposed scheme.

Index Terms —Image watermarking, automation, robustness,deep learning, convolutional neural networks.

I. IntroductionDigital image watermarking refers to the process of em-bedding and extracting information covertly on a cover-image.The data ( i.e., the watermark) is hidden into a cover-image tocreate a to-be-transmitted marked-image. The marked-imagedoes not visually reveal the watermark, and only the authorizedrecipients can extract the watermark information correctly.The techniques of image watermarking can be applied forvarious applications. Based on diﬀerent target scenarios, thewatermark information can be presented in diﬀerent forms;for example, the watermark can be some random bits or elec-tronic signatures for image protection and authentication [1],or some hidden messages for covert communication [2]. Inaddition, the watermark can be encoded for diﬀerent purposes,such as added security with encryption methods or restoringinformation integrity with error correction codes during acyberattack [3].For copyright protection, classic image watermarking re-search [4] only focuses on single-bit extractions, where theoutput indicates whether an image contains a watermark or not.

Xin Zhong, Pei-Chi Huang and Spyridon Mastorakis are with Departmentof Computer Science, University of Nebraska Omaha, Omaha, NE, 68182USA e-mail: {xzhong, phuang, smastorakis}@unomaha.edu

Frank Y. Shih is with Department of Computer Science, New JerseyInstitute of Technology, Newark, NJ 07102 USA e-mail: [email protected]

To enable a wide range of applications, modern image water-marking research primarily focuses on multi-bit scenarios thatextract the entire communicative watermark information [3],[5]. Typically, many factors should be considered in an imagewatermarking scheme, such as the ﬁdelity of the marked-image and the watermarkâĂŹs undetectability to computeranalysis. The proposed image watermarking scheme not onlysatisﬁes these factors but achieves the robustness of its priority:the watermark should survive even if the marked-image isdegraded or distorted. Ideally, a robust image watermarkingscheme keeps the watermark intact under a designated classof distortion without any assistance of techniques. However,in practice, the watermark is extracted approximately in manyattacking scenarios, and various encoding methods can beapplied for its restoration [6]. Achieving robustness is a majorchallenge in a blind image watermarking scheme, where theextraction must be performed without any information aboutthe original cover-image.Due to the limited scope of manual design, the traditionalimage watermarking methods encounter diﬃculties; for exam-ple, extraction can only tolerate certain types of distortionsin robust watermarking schemes, or the watermark itself canonly resist a limited range of computer analysis in undetectablewatermarking schemes [7]. To break through these drawbacks,incorporating deep learning into image watermarking hasattracted increased attention in recent years [8]. Deep learning,as a representation learning method, has enabled signiﬁcantimprovements in computer vision through its ability to ﬁt andgeneralize complex features. The major advantage of deeplearning methodologies for image watermarking is that theyperform image watermarking in a more adaptive manner bydynamically learning algorithms to extract both high- and low-level features for image watermarking from multiple instancebig data.Recent research on image watermarking tasks with deepneural networks has emerged [9], [10], [11], [12], but therestill exist challenging issues. For example, it is diﬃcult tofully utilize the ﬁtting ability of deep neural networks toautomatically learn and generalize both the watermark em-bedding and extracting processes. Also, labeling the groundtruth for an image watermarking task can be ill-deﬁned ortime-consuming. Finally, achieving robustness and blindnesssimultaneously without prior knowledge of adversarial exam-ples [12] remains unexplored.To address the above challenges, we present an automated,blind, and robust image watermarking scheme based on deeplearning neural networks. The contribution of this paper isthreefold. First, the ﬁtting ability of deep neural networks is ex-

This paper has been accepted for publication by the IEEE Transactions on Multimedia. The copyright is with the IEEE. DOI: 10.1109/TMM.2020.3006415 a r X i v : . [ c s . MM ] J u l ploited to automatically learn image watermarking algorithms,facilitating an automated system without requiring domainknowledge. Second, the proposed deep learning architecturecan be trained in an unsupervised manner to reduce human in-tervention, which is suitable for image watermarking. Finally,experimental results demonstrate the robustness and accuracyof the proposed scheme without using any prior knowledge oradversarial examples of possible attacks.The remainder of this paper is organized as follows. Therelated work is described in Section II. The proposed schemeis presented in Section III. Experiments and analysis are pre-sented in Section IV. Applications of the proposed watermark-ing scheme are discussed in Section V. Finally, conclusions aredrawn in Section VI.II. Related WorkThis section provides a detailed analysis of recent reports.Table I shows the analytical comparison of our proposedscheme with the start-of-the-art deep learning–based imagewatermarking schemes.In handcrafted watermarking algorithms, various optimiza-tion methods have been applied to adapt the embeddingparameters, and this research direction has attracted attentionsin recent years [13], [14], [15]. Consequently, exploring theoptimization ability of deep learning models for adaptive andautomated image watermarking is of great interest. However,compared to signiﬁcant advancements on image steganographywith deep neural networks [16], [17], deep learning–basedimage watermarking is still in its infancy.Kandi et al. [8] applied two convolutional autoencodersto reconstruct a cover-image. In a marked-image, the pixelsproduced by the ﬁrst autoencoder indicate bits with the valueof zero, and the pixels produced by the second autoencoder in-dicate bits with the value of one, hence developing a non-blindbinary watermarking scheme. Vukotic et al. [9] developeda deep learning–based, single-bit watermarking scheme byembedding through designed adversarial images and extractingvia the ﬁrst layer of a trained deep learning model. Li etal. [10] embedded a watermark into the discrete cosine domainby traditional algorithms [4] and applied convolutional neuralnetworks to facilitate the extraction.Besides the single-bit and multi-bit watermarking, attemptswere also reported in special scenarios. For example, for zero-watermarking, where a master share is sent separately from theimage, Fierro-Radilla et al. [11] applied convolutional neuralnetworks to extract the required feature from the cover-imageand linked these features with the watermark to create a mastershare. For the scenario of template-based watermarking, Kim et al. [18] embedded a handcrafted template by using a classicadditive method and estimated possible distortion parametersby comparing the extracted template to the original onewith the help of convolutional neural networks. Thus far, theexisting deep learning–based image watermarking schemes donot fully apply the ﬁtting ability of deep neural networks tolearn and generalize the embedding and extracting algorithms.Furthermore, due to the fragility of deep neural net-works [12], a modiﬁed image as an input to a trained deeplearning model may cause failure. In other words, robustness is a major challenge in deep–learning based image watermarkingbecause noise or modiﬁcations of the marked-image can resultin extraction failures. Mun et al. [19] proposed to solve thisissue by proactively including noisy marked-images as adver-sarial examples in the training phase. However, enumerating alltypes of attacks and their combinations may not be practicallyfeasible.To the best of our knowledge, our proposed scheme is theﬁrst method that explores the ability of deep neural networksin automatically learning and generalizing both watermark em-bedding and extracting algorithms, while achieving robustnessand blindness simultaneously.III. Proposed Image Watermarking SchemeWe revisit the typical design of an image watermarkingscheme and present the overview architecture of our scheme inSection III-A. Then, we present the loss function design andthe scheme objective in Section III-B. Finally, the detailedstructure of the proposed model is described in Section III-C. A. The Overview Architecture of the Proposed Scheme

The traditional design of an image watermarking schemeis shown in Fig.1. A watermark (denoted as w ) is embeddedinto a cover-image (denoted as c ) to produce a marked-image(denoted as m ) that looks similar to c and is transportedthrough a communication channel. Then, the receiver extractsthe watermark data (denoted as w ∗ ) from the received marked-image (denoted as m ∗ ) that could be a modiﬁed version of m if some distortions or attacks are occurred during thetransmission.To embed w into c , typically, the ﬁrst step is to project c into one of its feature spaces in spatial, frequency, orother domains. Next, w is encoded and embedded into thefeature space of c . The embedded feature space is projectedback into the cover-image space to create a marked-image m .Inversely, the watermark extraction is to project the marked-image reception m ∗ to the same feature space and then extractand decode the watermark information. Based on diﬀerenttarget applications, traditional image watermarking methodsmanually design the projection, embedding, extraction, encod-ing, and decoding functions. As the criteria of a design, animage watermarking scheme often highlights its ﬁdelity ( i.e., high similarity between the m and c ) and robustness ( i.e., keeping the integrity of w ∗ when m ∗ is distorted).Traditional image watermarking methods perform compe-tently through hand-designed algorithms; however, it remainschallenging to automatically learn these algorithms withoutcomplete dependence upon careful design. To tackle thisdiﬃculty, we propose a novel scheme which develops a deeplearning model to automatically learn and generalize the em-bedding, the extraction, and the invariance functions in imagewatermarking. Fig. 2 illustrates the overall architecture of theproposed scheme with some example images. Given two inputspaces that are all the possible inputs of watermark imagesand cover-images ( W and C , respectively), neural network µ θ parameterized by θ is applied to learn a function thatencodes W . W f , the encoded space of W , not only enlarges TABLE I:

An analytical comparison between the proposed scheme and state-of-the-art image watermarking methods applying deep neuralnetworks.

Method LearningWatermarkingAlgorithms? Blind Robust ExtractionType

Kandi et al. [8] No No Robust to common image processing attacks Multi-bitVukotic et al. [9] Learning extraction Yes Robust to Rotation, JPEG, and Cropping Single-bitLi et al. [10] No No No Multi-bitFierro-Radilla et al. [11] No No Robust to common image processing attacks Zero-watermarkingKim et al. [18] Assisting extraction No Focus on geometric attacks Template-based watermarkingMun et al. [19] Learning extraction Yes Robust to all enumerated attacks during training Multi-bitOurs Yes Yes Robust to common image processing attacks Multi-bit

Fig. 1:

The traditional design of an image watermarking scheme. W to prepare for the next-step concatenation, but also bringssome redundancy, decomposition, and perceivable randomnessto help information protection and robustness. Like the embed-ding process in traditional watermarking where an encoded w is inserted into a feature space of c , in the proposed scheme,an embedder that takes W f , C as inputs and produces themarked-image is ﬁt by the neural network σ θ parameterizedby θ . The marked-image space is named as M . To handlepossible distortions, a neural network τ θ parameterized by θ is introduced to learn to convert M to its enlarged andredundant transformed-space T . After the transformation, T preserves information about W f and rejects other irrelevantinformation, such as noises on M , therefore providing robust-ness. Finally, the inverse watermark reconstruction functionsare ﬁtted by two neural network components, ϕ θ and γ θ withtrainable parameters θ and θ , that extract W f from T anddecode W from W f , respectively.To compare the proposed scheme in Fig. 2 with the tradi-tional design in Fig.1, one can observe that the neural networkcomponents ﬁt and optimize the image watermarking processdynamically. Encoder and decoder networks (denoted as µ θ and γ θ , respectively) ﬁt the watermark encoding and decodingfunctions. An embedder (denoted as σ θ ) projects C into afeature space, embeds W f into the space, and projects tothe marked-image space M . An extractor (denoted as ϕ θ )inverses all the processes in σ θ , and τ θ handles the distortionduring the transmission through a communication channel.More details of the architectures are described in Section III-C.Compared to an autoencoder [20], where an input spaceis transformed to a representative and intermediate featurespace and the original input is recovered from this featurespace, the proposed scheme takes two spaces C and W asinputs, produces an intermediate marked-image space M , andrecovers W from M . The recovery ability of autoencoders, i.e., an exact reconstruction of the input with appropriatefeatures extracted by the deep neural networks, secures thefeasibility of watermark extraction in the proposed scheme.The reconstruction requires only M without the need of W and C , enabling the blindness of the proposed scheme. A featurespace in an autoencoder is often learned through a bottleneck for the dimensionality compression, but the proposed schemelearns equal-sized or over-complete representations to achievehigh robustness and accuracy of watermark extraction. B. The Loss Function and Scheme Objective

The entire architecture is trained as a single deep neuralnetwork with several loss terms designed for image water-marking. Given the data samples w i ∈ W , i = , , , . . . and c i ∈ C , i = , , , . . . , the proposed scheme can betrained in an unsupervised manner. There are two inputs w i and c i , and two outputs w ∗ i and m i in the proposed deepneural network. For the output w ∗ i , an extraction loss thatminimizes the diﬀerence between w ∗ i and w i is computed toensure full extraction of the watermark. For the output m i ,a ﬁdelity loss that minimizes the diﬀerence between m i and c i is computed to enable watermarking invisibility. For theoutput m i , we also compute an information loss that forces m i to contain the information of w i . To achieve this, wemaximize the correlation between feature maps of w if andfeature maps of m i , where the feature maps are the outputsof convolutional layers in the proposed architecture, and thefeature maps of w if and m i , i.e. , B and B , are illustratedin Figs. 3 and 4. Denoting the parameters to be learned as ϑ = [ θ , θ , θ , θ , θ ] , the loss function L ( ϑ ) of the proposedscheme can be expressed as: L ( ϑ ) = λ (cid:107) w ∗ i − w i (cid:107) + λ (cid:107) m i − c i (cid:107) + λ ψ ( m i , w if ) , (1)where λ i , i = , , ψ is a functioncomputing the correlation given as: ψ ( m i , w if ) = ((cid:107) g ( B ( w if )) , g ( B ( m i ))(cid:107) + (cid:107) g ( B ( w if )) , g ( B ( m i ))(cid:107) ) , (2)where g denotes the Gram matrix that contains all possibleinner products. By minimizing the distance between the Grammatrices of the feature maps of m i and w if produced byintermediate layer outputs B and B , we maximize theircorrelation. As the feature producers, the annotation of B and B is presented in Fig. 4, and the convolution block B that Fig. 2:

The overall architecture of the proposed image watermarking scheme.

Fig. 3:

The detailed components of the proposed watermarkingscheme: the Encoder µ θ , the Embedder σ θ , the Invariance Layer τ θ , the Extractor ϕ θ , and the Decoder γ θ . Every structure of theconvolution block is the same, but only block marked with ”*” needsintermediate results to compute the loss. contains B and B is annotated in Fig. 3. More discussionswill be presented in Section III-C.In Eq. 1, each two of the ﬁdelity loss, information loss,and extraction loss terms can be a trade-oﬀ for image water-marking. For example, minimizing the ﬁdelity loss term tozero means that m i is identical to c i . However, there is noembedded information in m i in this case, so the extraction of w i will fail. To allow some imperfectness of the loss terms, themean absolute error ( i.e., the L1 norm) is selected to highlightthe overall performance rather than a few outliers.With regularization, the proposed scheme objective is repre-sented as L ( ϑ ) + λ P , where P is the penalty term to achieverobustness as in Eq. 6, and λ is the weight controlling thestrength of the regularization term. The deep neural networkneeds to learn the parameter ϑ ∗ that minimizes L ( ϑ ) + λ P : ϑ ∗ = ar g min ϑ [ L ( ϑ ) + λ P ] . (3)In the backpropagation during training, the term λ (cid:107) w ∗ i − w i (cid:107) is applied by all the components of the proposed architecturein their weight updates, while only µ θ and σ θ apply terms λ (cid:107) m i − c i (cid:107) and λ ψ ( m i , w if ) to their weight updates. Thisenables µ θ and σ θ to encode and embed the information ina way that ϕ θ and γ θ are able to extract and decode thewatermark. C. Detailed Structure of the Proposed Neural Networks Model

This subsection describes the major components design ofneural networks: µ θ , σ θ , ϕ θ , γ θ and τ θ in more detail. Theoverall design is modularized and illustrated in Fig. 3. If wesingle out two pairs ( µ θ , γ θ ) and ( σ θ , ϕ θ ), we can ﬁndthat each pair is conceptually symmetrical. The watermark isconsidered as a 32 × × × ×

1) The Encoder µ θ and the Decoder γ θ : Given thesamples w i , i = , , , ... from the input space W , the encoder µ θ learns a function that encodes W to its code W f . Inversely,the decoder γ θ learns the decoding function from W f to W with samples w i ∗ f , i = , , , ... . The encoder µ θ successivelyincreases a 32 × × × × × ×

48, and the decoder γ θ successively decreases the32 × ×

48 feature space back to a 32 × × × × w if that has thesame width and height as the cover-image, so that we canconcatenate a feature map of w if and c i along their channeldimension. Each of w if and c i will contribute equally to the128 × × σ θ .Thus, we are evenly weighing the watermark and the cover-image. Second, this 32 × × × ×

48 incrementintroduces some redundancy, decomposition, and perceivablerandomness to W , which helps information protection androbustness.

2) The Embedder σ θ and the Extractor ϕ θ : The embedder σ θ applies the convolution block B to extract a 128 × × w if that is concatenated along thechannel dimension with the cover-image. Directly applying c i ,while only applying a feature map of w if , helps c i to dominatethe appearance. The 128 × × m i . The extractor ϕ θ inverses the process by two successive convolution blocks.To capture various scales of features for image watermark-ing, the inception residual block [21] is applied. It consistsof a 1 ×

1, a 3 ×

3, and a 5 × the channel dimension to form a 96-channel feature, and a 1ÃŮ 1 convolution is applied to convert the 96-channel featureback to the original input channel size for the summation in theresidual connection. The architecture of a convolution blockis shown in Fig. 4, where F w , F d , and F c , respectively, denotethe size of height, width, and channel. Fig. 4:

The architecture design of a convolution block. The blockinput/output size is denoted as the size of height ( F w ), the width( F d ), and the channel ( F c ), respectively. All the convolution blocks in Fig. 3 have the same inceptionresidual structure as shown in Fig. 4. In the case of the "*"convolution block B of Fig. 3, the annotated intermediateresults B and B of Fig. 4 are applied in Eq. 2. Speciﬁcally,block B extracts features not only from its input w if in thearchitecture, but also from m i . The annotated F w × F d × F w × F d × F c feature maps are the intermediate results B and B , respectively.

3) The Invariance Layer τ θ : The invariance layer is thekey component to provide robustness in the proposed imagewatermarking scheme. Using a fully-connected layer, τ θ learnsa transformation from space M to an over-complete space T , where the neurons are activated sparsely. The idea is toredundantly project the most important information from M into T and to deactivate the neural connections of the areas on M irrelevant of the watermark, thus preserving the watermarkeven if there is noise or distortion that modiﬁed a part of M . Asshown in Fig. 3, τ θ converts a 3-color-channel instance m i of M into an N -channel ( N ≥ t i of T , where N is the redundant parameter. Increasing N results in increased redundancy and decomposition in T , whichprovides higher tolerance of the errors in M and enhancesrobustness.Based on the contractive autoencoder [22], τ θ employs aregularization term that is obtained by the Frobenius norm ofthe Jacobian matrix of a layer’s outputs with regards to itsinputs. Mathematically, the regularization term P is given as: P = Σ i , j (cid:18) ∂ h j ( X ) ∂ X i (cid:19) , (4)where X i denotes the i -th input and h j denotes the output ofthe j -th hidden unit of the fully connected layer. Similar toa common gradient computation, the Jacobian matrix can bewritten as: ∂ h j ( X ) ∂ X i = ∂ A ( ω ji X i ) ∂ω ji X i ω ji , (5)where A is an activation function and ω ji is the weight between h j and X i . We set A as the hyperbolic tangent (tanh) for strong gradients and bias avoidance [23], and hence P canbe computed as: P = Σ j ( − h j ) Σ i ( ω Tji ) . (6)If the value of P is minimized to zero, all weights ω in τ θ will be zero, so that the output of τ θ will be always zero nomatter how we change the inputs X . Thus, minimizing P alonewill cause the rejection to all the information from the inputs m i . Therefore, we place P as a regularization term in the totalloss function to preserve useful information related to the lossterms of image watermarking, while rejecting all other noiseand irrelevant information. In this way, we achieve robustnesswithout prior knowledge of possible distortion.Remarkably, each color channel in m i is treated as asingle input unit to signiﬁcantly improve the computationaleﬃciency. For example, if we treat one pixel as an input, a128 × × N to its smallest value 3 willimply 49 , × τ θ , which requires at least 49 , × , N to enable higherredundancy for higher robustness.IV. Experiments and AnalysisThis section experimentally analyzes quantitative and an-alytical evaluation of the proposed deep learning–based im-age watermarking scheme. Section IV-A introduces our datapreparation, and Section IV-B presents the experimental de-sign, training and validation. To validate our proposed imagewatermarking approach, Section IV-C provides special testingexperiments on synthetic images, and Section IV-D showsthe robustness in diﬀerent distortion. A feasibility case studyon the scenario of watermark extraction from phone camerapictures is also presented in Section IV-E. A. Preparation of Datasets

The proposed deep learning–based image watermarkingarchitecture was trained as a single deep neural network.ImageNet [24] was rescaled to size 128 ×

128 with RGBchannels and then used as the cover-images. The binary versionof CIFAR [25] with its original size 32 ×

32 was used as thewatermarks because the proposed architecture used 32 × ,

000 images fromeach dataset that were not used during the training phase areseparated.The testing is performed on 10,000 images (rescaled to128 × the proposed scheme learns and generalizes the watermarkingalgorithms without over-ﬁtting to the training samples. B. Training, Validation and Testing of the Proposed Model

As described in Section III, the proposed image watermark-ing scheme is trained as a single and deep neural network. TheADAM optimizer [27], which adopts a moving window in thegradient computation, is applied, for its ability of continuouslearning after large epochs. The training and validation of theproposed scheme are shown in Fig. 5, where the values ofthe terms in the loss (Eq. 1) and objective (Eq. 3) during 200epochs are presented. During both training and validation, theterms T1 and T2 (deﬁned in Fig. 5) in L ( ϑ ) converge smoothlybelow 0.015, and L ( ϑ ) + λ P converges below 0.03, indicatinga proper ﬁt. Term T1 has slightly more errors because whencarrying the watermark, a marked-image cannot be completelyidentical to a cover-image. λ , λ , and λ are all set to be 1 toequally weigh the terms, and λ is set to be 0.01 as suggested in[22]. All the layers apply the rectiﬁed linear unit (ReLU) as theactivation function apart from the outputs (marked-image andwatermark extraction), which use sigmoid to limit the rangeto (0, 1). Fig. 5:

The loss values during training and validation.

At the testing phase, the peak signal-to-noise ratio (

PSNR )and bit-error-rate (

BER ) are respectively used to quantitativelyevaluate the ﬁdelity of the marked-images and the quality ofthe watermark extraction. The

PSNR is deﬁned as:

PSN R =

10 log (cid:18) max ( c i ) MSE ( c i , m i ) (cid:19) , (7)where MSE is the mean squared error. The

BER is computedas the percentage of error bits on the binarization of thewatermark extraction w ∗ i . In the testing, the BER is zero,indicating that the original and the extracted watermarks areidentical. The testing

PSNR is 39.72 dB, indicating a highﬁdelity of the marked-images, so that the hidden informationcannot be noticed by human vision. A few testing exampleswith various image content and color are presented in Fig. 6,where we can observe high ﬁdelity and full extraction. Thewatermark codes w if do not directly reveal information aboutthe watermark, which shows the perceivable randomness anddecomposition learned by the proposed model. C. The Proposed Scheme on Synthetic Images

To further validate that the image watermarking task isproperly generalized, the proposed scheme is applied forsynthetic RGB cover-images and watermarks. The results of

Fig. 6:

A few testing examples of the proposed scheme with variousimage content and color. the blank cover-images and the random bits are presentedbelow.Fig. 7 illustrates the scenario of embedding binary water-marks into synthetic blank cover-images of black, red, white,green, and blue colors, respectively. Although the blank cover-images are not included in the training, the proposed schemeprovides promising results. Applying blank cover-images isknown to be extremely diﬃcult in conventional watermarkingmethods due to the lack of psycho-visual information. How-ever, unlike traditional methods that assign some unnoticeableportions of visual components as the watermark, the proposeddeep learning model learned to apply the correlation betweenthe features of space W f and of M to indicate the watermark. Fig. 7:

Embedding watermarks into blank cover-image examples. (a)and (b): the embedded and extracted watermarks; (c) and (d): the ﬁveblank cover- and marked-images.

Fig. 8 shows an example of embedding a randomly gen-erated binary image into a natural cover-image. To test theapplication scenarios where the watermarks are encrypted torandom bits (besides the displayed example), 10 ,

000 randomlygenerated bit sets are tested on 10 ,

000 cover-images from thetesting dataset. The average

BER is 0 . D. The Robustness of the Proposed Scheme

The robustness of the proposed scheme against diﬀerentdistortions on the marked-image is evaluated by analyzing

Fig. 8:

Embedding random bits as the watermark. (a) random bits;(b) a cover-image; (c) extraction; and (d) the marked-image. the distortion tolerance range. Fig. 9 illustrates a few visualexamples of the marked-images and their distortions.

Fig. 9:

A few visual examples of the marked-images (top row) andtheir distortions (bottom row). The methods are implemented fromleft to right: Histogram Equalization, Gaussian Blur, Salt & PepperNoise, and Cropping, respectively.

Due to the over-complete design and the invariance layer τ θ , the proposed schemes can tolerate distortions at a veryhigh percentage (see Fig. 10 (b) and (d) for an example oflarge cropping). Fig. 10:

An example of watermark extraction under large percentagecropping. (a) an example marked-image, (b) cropped (a), (c) originalwatermark, (d) extraction with τ , and (e) extraction without τ . To demonstrate the importance of our core robustnessprovider, two control experiments extracting watermarks fromdistorted marked-images with and without τ θ are conducted.Without τ θ , the proposed scheme cannot extract the correctwatermark (one example is shown in Fig.10), and the ex-traction from 10,000 testing attacked marked-images yieldsan average BER as high as 42.46%, which illustrates thesigniﬁcance of τ θ if we compare to the results presented inFig.11.With τ θ , distortions with swept-over parameters that controlthe attack strength are applied on the marked-images producedfrom the testing dataset. The watermark extraction BER causedby each distortion under each parameter is averaged over thetesting dataset. The distortions with swept-over parametersversus the average

BER are plotted in Fig. 11. Since focusingon image-processing attacks, the responses of the proposed scheme against some challenging image-processing attacks arediscussed. The proposed scheme shows high tolerance rangeon these challenges, especially for cropping, salt-and-peppernoise, and JPEG compression. For example, the extracted wa-termarks have low average

BER s as 7.8%, 11.6%, and 12.3%under severe distortions including a cropping discarding 65%of the marked image, a JPEG compression with a low qualityfactor 10, and a 90% salt-and-pepper noise. The attacks thatrandomly ﬂuctuate the pixel values through image channelsshow a higher

BER , including Gaussian additive noise andrandom noise that sets a random pixel to a random value.These extreme attacks can easily destroy most of the contentson the marked-image (see few examples in Fig. 12). Still,the proposed system achieves good performances when themarked-image contents are decently preserved, such as 14%

BER on a 11% random noise.

Fig. 11:

Distortions with swept-over parameters versus average BER.

Fig. 12:

An example of extreme distortions. (left): the marked-image;(middle): after Gaussian additive noise with variance 0.2; and (right):after 20% random noise.

As discussed in Table I, MunâĂŹs scheme [19] has theclosest purpose with ours because it also achieves blindnessand robustness simultaneously. Hence, we further compareour scheme with Mun’s scheme for analysis. The comparisonis performed on the same cover-image sets and the samewatermark images as reported in Mun’s scheme. To analyzethe robustness of the proposed scheme, the extraction

BER sunder common image-processing attacks are shown in Table II.

The proposed scheme shows the advantages by covering moredistortion categories in image-processing attacks and obtaininga lower

BER under the same distortion parameters. AlthoughMun’s method can tolerate geometric distortions while theproposed scheme cannot, Mun’s method requires the presenceof the distortions in the training phase for robustness. In thereal world, there is no way to predict and enumerate all kindsof attacks.

E. A Case Study: Feasibility Test on Watermark Extractionfrom Camera Resamples

Currently, image watermarking applications are much dif-ferent than they used to be when they were ﬁrst developed.Early scenarios focus on copyright detection, while morerecent real-world communication requirements introduce achallenging use-case: the watermark extraction from phonecamera resamples of marked-images [28]. The challenges inthis use-case arise because watermark extraction needs tohandle multiple combinations of the distortions, such as opticaltilt, quality degradation, compression, lens distortions, andlighting variation. Most existing approaches [29], [30], [31],[32] focus on the resamples of printed marked-images, noton phone resamples of a computer monitor screen. This use-case typically involves additional distortions, such as the Moirépattern ( i.e., the RGB ripple), the refresh rate of the screen,and the spatial resolution of a monitor (some examples arementioned in Fig. 13). We applied the proposed scheme asa major component in such scenarios since it is designed toreject all irrelevant noise instead of focusing on certain typesof attacks. The outline of our scenario is shown in Fig. 13.

Fig. 13:

The phone camera testing scenario.

An information provider prepares the information by en-coding through an Error Correction Coding (ECC) technique.Although trained as an entire neural network, the proposedscheme is separated into the embedding components ( µ θ and σ θ ) and the extracting components ( τ θ , ϕ θ and γ θ ). Themarked-image can be obtained by embedding the encodedwatermark into the cover-image using the trained embeddingcomponents. The marked-image that looks identical to thecover-image is distributed online and displayed on the user’sscreen. A user scans the marked-image with a phone to extractthe hidden watermark through the extracting components.The distortions occurred in this test can be divided intotwo categories: perspective and image-processing distortions.The major function of the proposed scheme in this scenario is to overcome the pixel-level modiﬁcations coming fromimage-processing distortions like compression, interpolationerrors and the Moiré pattern. To concentrate the test on theproposed scheme, we simpliﬁed the solution of the perspectivedistortions, although this can be an entire challenging researchtrack [33], [34].With this setup, we develop a prototype for a user study.Fig. 14 (a) illustrates the Graphical User Interface (GUI) anda 32 ×

16 sample information. Classic Reed Solomon ( RS )code [35] is adopted as the ECC to protect the information. RS ( , ) is applied to protect each row of the 32 × × × Fig. 14:

The prototype. (a) The GUI and sample information; and(b) the simple rectiﬁcation.

In this case study, ﬁve volunteers were invited to take atotal of 25 photos of some marked-images displayed on a2 , × ,

440 screen using the camera on their mobile phones.The volunteers’ phones were Google Pixel 3, Samsung Galaxys9, iphone XR, Xiaomi 8, and iphone X. All the photos weretaken under oﬃce light conditions. Volunteers were given tworules. First, the entire image should be placed as large aspossible inside the ROI. As a prototype for demonstration,this rule facilitates our segmentation that the largest contourinside the ROI is the marked-image, so that this applicationcan focus on the test of the proposed system instead of somecomplicated segmentation algorithms. In addition, placing theimage largely in the ROI helps with the capture of desireddetails and features for the watermark extraction. Second,the camera should be kept as stable as possible. Althoughthe proposed system tolerates some blurring eﬀects, it is notdesigned to extract watermarks in high-speed motion.Fig. 15 presents ﬁve watermark extractions, their

BERs , andthe corresponding ROIs. The closer up the photo is taken,the lower the error. Also, a lower error was observed with agreater parallel angle between the camera and the screen. Theﬂashlight brings more errors due to over- or under-exposure tosome image areas. Use of the ﬂashlight in this application isoptional because the screen has the back-lights. The average

BER was 5.13% for the 25 images.

TABLE II:

A quantitative comparison between the proposed scheme and Mun’s scheme.

Method BER (%) under the distortions PSNR(dB) Capacity(bits)HE JPEG10 Cropping20% S&P5% G. F.10%

Mun’s N / A N / A N / A denotes "Not Applicable" that the robustness of an attack was not covered in [19]; HE denotes histogram equalization; JPEG 10 denotes aJPEG compression with quality factor 10; S & P denotes the salt-and-pepper noise; G. F. denotes Gaussian ﬁltering.

Fig. 15:

Five examples of the watermark extractions before ECCand their ROIs. The left-hand side of each picture is the ROI, andthe right-hand side is the extracted watermark. The

BER s of theextractions from left to right are 3.71%, 4.98%, 1.07%, 4.30%, and8.45%, respectively.

For a visual comparison, the displayed watermark extrac-tions are the raw results before error correction. After execut-ing RS ( , ) , all the watermark extractions in the testingcases can be restored to the original information withouterrors, as shown in Fig. 14. The proposed scheme can success-fully extract the watermark within one second because it onlyapplies the trained weights on the marked-image rectiﬁcation.V. Applications of the Proposed Watermarking SchemeIn this section, we discuss diﬀerent application scenar-ios, where the proposed image watermarking scheme canbe applied to (i) authorized IoT device onboarding in Sec-tion V-A; (ii) creation of private communication channels inSection V-B; and (iii) authorized access to privileged contentand services in Section V-C. We assume that a watermarkis created based on credentials or secrets provided by users(e.g., passwords, cryptographic keys, or ﬁngerprint scans). Ourscenarios can also utilize the watermark extraction techniquefrom camera resamples that we presented in Section IV-E. A. Authorized IoT Device Onboarding

IoT devices need to be "onboarded" once are bought byusers in terms of being associated with a home controller orsome cloud service so that users can control them [36], [37].Currently, the widely-used methods to onboard IoT devicesare mostly out-of-band and include the use of QR codes phys-ically printed on devices, pin codes, and serial numbers [38].For example, once a user buys a smart IoT camera, he/shescans a QR code printed on the camera (or the packaging) withhis/her mobile phone and through a mobile application, he/sheconnects the camera with a cloud service usually oﬀered by itsmanufacturer. In this way, the user is able to watch the videostream capture from the camera.Existing onboarding methods do not protect against unau-thorized access. For example, attackers that have physicalaccess to an IoT device can tamper it (e.g., install malwareon the device) before the device is onboarded by its owner. The proposed image watermarking mechanism can be utilizedto enable the onboarding of IoT devices only by the deviceowner. For example, user credentials can be embedded to aQR code, which will be physically printed on a device. Once auser receives his/her IoT device ( i.e. , an IoT camera with a QRthat has the user credential embedded), he/she takes a pictureof the QR code with his/her mobile phone. To onboard thedevice, the user sends the taken QR picture along with his/hercredentials to a server that runs the extraction via a deep neuralnetwork. The deep neural network veriﬁes that the user indeedpossesses the credentials (watermark) embedded into the QRcode and authorizes the user to onboard the IoT device.

B. Creation of Private Communication Channels

The proposed image watermarking scheme can be used forthe creation of private chatrooms and other communicationchannels. For instance, a chatroom organizer can collect thecredentials of individuals that he/she would like to communi-cate with and create a QR code with this set of credentialsembedded to the code. The created QR code can be uploadedon the Internet. Once an individual that has been includedin the communication group scans the QR code with his/hermobile phone and provides his/her credential to the deep neuralnetwork, the network veriﬁes that this individual indeed pos-sesses credentials embedded into the QR code and authorizesthe user to join the chatroom. Unauthorized Internet users (i.e.,users that do not have their credentials embedded into the QRcode watermark) might try to join the chatroom, but they willnot be able to do so; even if such users take a photo of theQR code, they do not possess credentials that are embeddedinto this code; thus, the deep neural network will reject theirrequests to join the chatroom. The QR code with the embeddedwatermark will be publicly available and can be accessed byall Internet users, however, only the authorized users will beable to join the chatroom and communicate with each other.

C. Authorized Access to Privileged Content and Services

The proposed image watermarking scheme can be alsoutilized for a broader scope of applications, where access toprivileged content and/or services is desirable. The contentproducer or service provider will create cover-images that havethe credentials of authorized users embedded. As a result,image watermarking can be used as an access control mech-anism, where access to certain pieces of (privileged) contentand/or services are restricted only to authorized users; i.e. ,users that have their credentials (watermark) embedded intothe marked image. Similarly, when authorized users send themarked image and the embedded credentials to the deep neural network, the network will be able to extract the watermarkonly if the user possesses the proper credentials. Only usersthat possess the credentials (watermark) embedded into themarked image will be allowed to access the privileged content.VI. ConclusionThis paper introduces an automated and robust image wa-termarking scheme using deep convolutional neural networks.The proposed blind image watermarking scheme exploits theﬁtting ability of deep neural networks to generalize imagewatermarking algorithms, shows an architecture that trains inan unsupervised manner for watermarking tasks, and achievesits robustness property without requiring prior knowledgeof possible distortions on the marked-image. Experimentally,we have not only reported the promising performances forindividual common attacks, but also have demonstrated thatthe proposed scheme has the ability and the potential to helpcombinative, cutting-edge, and challenging camera applica-tions, which has conﬁrmed the superiority of the proposedscheme. Our future work includes tackling geometric andperspective distortions by the deep neural networks inside thescheme, and reﬁning the scheme architecture, objective andloss function by diﬀerent methods like ablation studies.References [1] H. Berghel and L. O’Gorman, “Protecting ownership rights throughdigital watermarking,” Computer , vol. 29, no. 7, pp. 101–103, 1996.[2] I. J. Cox, M. L. Miller, and A. L. McKellips, “Watermarking as com-munications with side information,”

Proceedings of the IEEE , vol. 87,no. 7, pp. 1127–1141, 1999.[3] F. Y. Shih,

Digital watermarking and steganography: fundamentals andtechniques . CRC press, 2017.[4] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, “Secure spreadspectrum watermarking for multimedia,”

IEEE transactions on imageprocessing , vol. 6, no. 12, pp. 1673–1687, 1997.[5] I. Cox, M. Miller, J. Bloom, J. Fridrich, and T. Kalker,

Digital water-marking and steganography . Morgan kaufmann, 2007.[6] X. Kang, J. Huang, Y. Q. Shi, and Y. Lin, “A dwt-dft composite water-marking scheme robust to both aﬃne transform and jpeg compression,”

IEEE transactions on circuits and systems for video technology , vol. 13,no. 8, pp. 776–786, 2003.[7] S. Craver, N. Memon, B.-L. Yeo, and M. M. Yeung, “Resolving rightfulownerships with invisible watermarking techniques: Limitations, attacks,and implications,”

IEEE Journal on selected Areas in communications ,vol. 16, no. 4, pp. 573–586, 1998.[8] H. Kandi, D. Mishra, and S. R. S. Gorthi, “Exploring the learning capa-bilities of convolutional neural networks for robust image watermarking,”

Computers & Security , vol. 65, pp. 247–268, 2017.[9] V. Vukotic, V. Chappelier, and T. Furon, “Are deep neural networks goodfor blind image watermarking?” in . IEEE, 2018, pp. 1–7.[10] D. Li, L. Deng, B. B. Gupta, H. Wang, and C. Choi, “A novel cnn basedsecurity guaranteed image watermarking generation scenario for smartcity applications,”

Information Sciences , vol. 479, pp. 432–447, 2019.[11] A. Fierro-Radilla, M. Nakano-Miyatake, M. Cedillo-Hernandez,L. Cleofas-Sanchez, and H. Perez-Meana, “A robust image zero-watermarking using convolutional neural networks,” in . IEEE, 2019,pp. 1–5.[12] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, andA. Swami, “The limitations of deep learning in adversarial settings,” in .IEEE, 2016, pp. 372–387.[13] Y. Huang, B. Niu, H. Guan, and S. Zhang, “Enhancing image water-marking with adaptive embedding parameter and psnr guarantee,”

IEEETransactions on Multimedia , vol. 21, no. 10, pp. 2447–2460, 2019. [14] Z. Su, G. Zhang, F. Yue, L. Chang, J. Jiang, and X. Yao, “Snr-constrained heuristics for optimizing the scaling parameter of robustaudio watermarking,”

IEEE Transactions on Multimedia , vol. 20, no. 10,pp. 2631–2644, 2018.[15] Z. Chen, L. Li, H. Peng, Y. Liu, and Y. Yang, “A novel digitalwatermarking based on general non-negative matrix factorization,”

IEEETransactions on Multimedia , vol. 20, no. 8, pp. 1973–1986, 2018.[16] Z. Wang, N. Gao, X. Wang, X. Qu, and L. Li, “Sstegan: Self-learningsteganography based on generative adversarial networks,” in

Interna-tional Conference on Neural Information Processing . Springer, 2018,pp. 253–264.[17] S. Baluja, “Hiding images in plain sight: Deep steganography,” in

Advances in Neural Information Processing Systems , 2017, pp. 2069–2079.[18] W.-H. Kim, J.-U. Hou, S.-M. Mun, and H.-K. Lee, “Convolutional neuralnetwork architecture for recovering watermark synchronization,” arXivpreprint arXiv:1805.06199 , 2018.[19] S.-M. Mun, S.-H. Nam, H. Jang, D. Kim, and H.-K. Lee, “Finding robustdomain from attacks: A learning framework for blind watermarking,”

Neurocomputing , vol. 337, pp. 191–202, 2019.[20] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality ofdata with neural networks,” science , vol. 313, no. 5786, pp. 504–507,2006.[21] C. Szegedy, S. Ioﬀe, V. Vanhoucke, and A. A. Alemi, “Inception-v4,inception-resnet and the impact of residual connections on learning,” in

Thirty-First AAAI Conference on Artiﬁcial Intelligence , 2017.[22] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contractiveauto-encoders: Explicit invariance during feature extraction,” in

Proceed-ings of the 28th International Conference on International Conferenceon Machine Learning . Omnipress, 2011, pp. 833–840.[23] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Eﬃcientbackprop,” in

Neural networks: Tricks of the trade . Springer, 2012,pp. 9–48.[24] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,A. Karpathy, A. Khosla, M. Bernstein et al. , “Imagenet large scale visualrecognition challenge,”

International journal of computer vision , vol.115, no. 3, pp. 211–252, 2015.[25] A. Krizhevsky, G. Hinton et al. , “Learning multiple layers of featuresfrom tiny images,” Citeseer, Tech. Rep., 2009.[26] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects incontext,” in

European conference on computer vision . Springer, 2014,pp. 740–755.[27] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980

Journal of Systems andSoftware , vol. 135, pp. 205–215, 2018.[30] W.-g. Kim, S. H. Lee, and Y.-s. Seo, “Image ﬁngerprinting schemefor print-and-capture model,” in

Paciﬁc-Rim Conference on Multimedia .Springer, 2006, pp. 106–113.[31] T. Yamada and M. Kamitani, “A method for detecting watermarks inprint using smart phone: ﬁnding no mark,” in

Proceedings of the 5thWorkshop on Mobile Video . ACM, 2013, pp. 49–54.[32] L. A. Delgado-Guillen, J. J. Garcia-Hernandez, and C. Torres-Huitzil,“Digital watermarking of color images utilizing mobile platforms,” in . IEEE, 2013, pp. 1363–1366.[33] M. Zhao, Y. Wu, S. Pan, F. Zhou, B. An, and A. Kaup, “Automatic reg-istration of images with inconsistent content through line-support regionsegmentation and geometrical outlier removal,”

IEEE Transactions onImage Processing , vol. 27, no. 6, pp. 2731–2746, 2018.[34] M. Jaderberg, K. Simonyan, and A. Zisserman, “Spatial transformernetworks,” in

Advances in neural information processing systems , 2015,pp. 2017–2025.[35] I. S. Reed and G. Solomon, “Polynomial codes over certain ﬁnite ﬁelds,”

Journal of the society for industrial and applied mathematics , vol. 8,no. 2, pp. 300–304, 1960.[36] S. Mastorakis, A. Mtibaa, J. Lee, and S. Misra, “ICedge: When EdgeComputing Meets Information-Centric Networking,”