Adaptive Control of Embedding Strength in Image Watermarking using Neural Networks
Mahnoosh Bagheri, Majid Mohrekesh, Nader Karimi, Shadrokh Samavi
1 Adaptive Control of Embedding Strength for Image Watermarking using Neural Networks
Mahnoosh Bagheri, Majid Mohrekesh, Nader Karimi, Shadrokh Samavi
Isfahan University of Technology Isfahan,
Iran
Abstract — Digital image watermarking has been widely used in different applications such as copyright protection of digital media, such as audio, image, and video files. Two opposing criteria of robustness and transparency are the goals of watermarking methods. In this paper, we propose a framework for determining the appropriate embedding strength factor. The framework can use most DWT and DCT based blind watermarking approaches. We use Mask R-CNN on the COCO dataset to find a good strength factor for each sub-block. Experiments show that this method is robust against different attacks and has good transparency.
Keywords — watermark; strength factor; mask R-CNN; discrete cosine transform; discrete wavelet transform. I. I NTRODUCTION
Digital image watermarking technology has been receiving more attention than before due to the rapid development of multimedia. This technology is a method for embedding a logo into the host image without causing a considerable visual change in the image. The logo could be copyright information, side information or any other information required to be embedded in the host image. A watermarking method can be blind or non-blind. Non-blind methods, such as that of [1], need side information to extract watermark data but blind methods do not require such information. The embedding method can be in the spatial domain or the transform domain. The embedded watermark data should be robust enough against attacks and also transparent sufficient to meet the standard measures. Different methods have proposed during the past few years, for watermarking with different capabilities and restrictions. In [2] as an example, proper blocks are found for embedding in the host image, and a discrete cosine transform (DCT) is applied. Then coefficients that have fewer effects on the imperceptibility are selected, and information is embedded in those coefficients using a threshold. Margolis et al. [3] proposed a method that embeds in the transform domain. They use student-T distribution to find transform coefficients. DCT and discrete wavelet transform (DWT) are transform domains they use in their paper separately. Some papers cascade two transforms [4], [5] and [6]. The work done in [4] at first, applies DWT on image and then DCT on LL of the first level in the transformed image. Then it chooses the lower frequency coefficients of the final transformed image for embedding watermark string. In [5], after using ROI detection for selecting best host blocks, lowest ones in an ROI ranking, DCT is applied on sub-blocks of LH, HL, and HH in wavelet transform of them. Here like [2], the authors choose a couple of coefficients that have a lower effect on human eyes and also adequate robustness at the same time. Embedding phase in this paper finishes by conditional swapping of the coefficients couple considering each bit of watermark stream. The distance of couple value will be increased to enhance the robustness of their embedding method. We, in one of our previous works [6], used a cascade of DCT after DWT on a color space transforming. The embedding phase in this paper is different for each channel of YUV color space. As Y channel stores illuminance date, we embed in half of its capacity although the full capacity of U and V channels are used for embedding bits of watermark logo. In [7], after applying DWT on blocks of the image, an adaptive strength factor is used for embedding. The strength factor in this paper is computed by a function of wavelet energy and saliency of the image. Some research works used optimization algorithms to calculate better strength-factor, similar to [8]. The authors use the Cuckoo search algorithm in the wavelet domain to find the best strength factor. These mentioned papers used almost naïve methods without any serious use of artificial intelligence. Around the year 2012, scientists started to use neural networks (NN) and similarly, researchers use these networks in watermarking methods. But most of the papers use neural networks for some parts of their methods. For example, an NN in [9] is used for finding the best strength factor. They used their own designed NN After applying a four-level wavelet transform on the host image. This NN results in predicting a sequence of coefficients that can choose the best strength factor for embedding. Zheng et al. [10] applied a DWT and then SVD on the host image and, after that, used a designed network. This network is trained to embed the watermark data in the transformed image. In [9] and [10], the authors both developed and trained a network for watermarking. As there is a lot of pre-trained networks, it seems reasonable to choose one of them and use it for embedding. Mask R-CNN [11] is one of the popular networks which is used for semantic segmentation. It was pre-trained on the COCO dataset [12] and can segment 48 classes in the target image. Mask R-CNN is an advanced version of Faster R-CNN [13]. Faster R-CNN is pre-trained on the PASCAL dataset to detect objects based on object locations hypothesis. In this paper, we propose a blind method based on NN for calculating an adaptive strength factor. Our embedding phase consists of blocking, DWT, sub-blocking, and then DCT. The extraction phase is the inverse of the embedding phase. We also 2 applied different attacks to embedded images to show the robustness of our proposed method. This paper is organized as follows: The proposed method is given in Section II, the experimental results are shown in Section III, and the paper concludes in the last Section. II. P ROPOSED M ETHOD
Our proposed watermarking method contains four main stages. In this section, we first explain a NN that segments the important parts of the image. Then, the way of computing strength factor and embedding phase is, and finally, extracting the logo image from the watermarked image is explained. Figure 1 shows our proposed scheme for adaptive image watermarking. A. Mask R-CNN Network
The new generation of Faster R-CNN is Mask R-CNN and is appropriate for semantic segmentation. The whole structure of Mask R-CNN is shown in Figure 2-a. The first stage of R-CNN finds candidates to be accepted as objects and points them by rectangular bounding boxes. This stage, called Region Proposal Network (RPN), is a neural network that uses a spatial sliding window to map images to a lower dimension vector. This vector is the input of two layers: the box-regression layer (reg layer) and the box-classification layer (cls layer). As shown in figure 2-b, if we have K regions, the cls layer and reg layer produce 2×K scores and 4×K coordinates, respectively [13]. The output of the second stage in Mask R-CNN is a mask that predicts the class and box offset for each ROI [11]. We used Mask R-CNN for our proposed method to get masks for each class in the host image. B. Compute Strength Factor
One of the outputs of Mask R-CNN contains masks for any given class. As a study, we defined a table that consists of two classes: person and car. We chose images inclusive of both person and car and extract their masks by Mask R-CNN. As Figure 3 shows, each mask multiplies by its coefficient that could be arbitrarily set considering any particular condition. The following equation produces 𝑀𝑀 𝐸𝐸 as a final embedding map, 𝑀𝑀 𝐸𝐸 = 𝑘𝑘 𝛼𝛼 �1 − max 𝑖𝑖∈{𝐶𝐶,𝑃𝑃} (𝑐𝑐 𝑖𝑖 × 𝑀𝑀 𝑖𝑖 )� (1) where 𝐶𝐶 and 𝑃𝑃 represent for respective classes Car and
Person and 𝑐𝑐 𝑖𝑖 and 𝑀𝑀 𝑖𝑖 are coefficients and maps for each class. 𝑘𝑘 𝛼𝛼 is a constant coefficient to scale 𝑀𝑀 𝐸𝐸 for embedding. The number of classes could be more significant in the actual implementation of our method to cover any number of desired objects and ROIs. Mean of 𝑀𝑀 𝐸𝐸 values corresponding to sub-block is computed as strength factor. C. Embedding Phase
The start stage of our embedding method is similar to our previous work [5] and applies a saliency detection on host grayscale image to locate blocks with minimum saliency as candidates of embedding. Figure 1 shows the overall operation of embedding information in the host image of our previous work [5], resize all images to 512×512. Then compute saliency for grayscale host images to find ROI areas by semantic segmentation from Mask R-CNN. ROI areas have a lower rank in host image blocks and have a lower probability of embedding. Thus, after partitioning the host image to 128×128 blocks, a subset of them with lower ROI pixels are embedding candidates. The first five blocks of size 128×128 are selected, and two levels of DWT are applied to them, and then LH, HL, and HH parts will be partitioned to 8×8 sub-blocks. Then, we applied DCT on each sub-block and embed one bit in each 8×8 sub-block. Redundancy could be considered if the candidate sub-blocks were more than watermark bits. The embedding process is done by the state of a couple of coefficients in the DCT domain. If the watermark bit is ‘1’, DCT(6,7) should be higher than DCT(7,6) else, we swap them. For ‘0’, the rule is inverse as equation 2. � 𝐷𝐷𝐶𝐶𝐷𝐷 ( ) > 𝐷𝐷𝐶𝐶𝐷𝐷 ( ) 𝑖𝑖𝑖𝑖 𝑤𝑤 = 1𝐷𝐷𝐶𝐶𝐷𝐷 ( ) < 𝐷𝐷𝐶𝐶𝐷𝐷 ( ) 𝑖𝑖𝑖𝑖 𝑤𝑤 = 0 (2) Fig 1. Block diagram of the proposed method. D. Extracting Phase
The extraction of this method is blind and does not need the host image. When the watermarked image is delivered to the receiver, it will be partitioned to 128×128 blocks, and five selected blocks can be detected by indexes as side information. After that, DWT is applied to each of them and LH, HL, and HH parts are divided into 8×8 sub-blocks. Then, we use DCT on each, and the extraction method is based on Equation 3. �1 𝑖𝑖𝑖𝑖 𝐷𝐷𝐶𝐶𝐷𝐷 ( ) > 𝐷𝐷𝐶𝐶𝐷𝐷 ( )0 𝑖𝑖𝑖𝑖 𝐷𝐷𝐶𝐶𝐷𝐷 ( ) < 𝐷𝐷𝐶𝐶𝐷𝐷 ( ) (3) We use a voting algorithm when redundant watermark bits are embedded in the same image to extract information that leverages the robustness of our method in the presence of attacks to the watermark image. III. E XPERIMENTAL R ESULT
We test our method on images of the COCO dataset that has a person and car because our table concludes these classes. The COCO dataset has about 82K images that we select 93 images for evaluating our method. Input images have different sizes so at first, we resize all images to 512×512. The watermark bits by the size 4×4, one of the input images and it’s watermarked image are shown in Figure 3. Our method is implemented in MATLAB R2015a in laptop Corei5-4200U CPU and 6GB of RAM and we use pre-train Mask R-CNN in Jupyter Notebook. The watermark information is a 4×4 logo that can be random that half of them should be 0, and the other half should be 1 like Figure 4-b. A. Evaluation Parameters
There are parameters that we can evaluate our methods by them. These parameters can show the goodness of our method and get the ability to compare our approach to other similar methods. One of them is the normalized correlation (NC)
𝑁𝑁𝐶𝐶 = 1𝑀𝑀 × 𝑁𝑁 � � 𝑊𝑊(𝑖𝑖, 𝑗𝑗) × W′(I, j)
𝑁𝑁𝑗𝑗 =1𝑀𝑀𝑖𝑖 =1 (4) where 𝑁𝑁 and 𝑀𝑀 are sizes of the image, 𝑊𝑊(𝐼𝐼, 𝑗𝑗) , and
𝑊𝑊’(𝐼𝐼, 𝑗𝑗) is watermark information and extracted watermark respectively. The other parameter is Structural Similarity (SSIM) where
SSIM = (2µi µj + c1)(2σij + c2) (µi + µj + c1)(σi^2 + σi^2 + c2) (5) That µ and σ are mean and variance and c and c are constants. The Bit Error Rate (BER) is defined as below 𝐵𝐵𝐵𝐵𝐵𝐵 = ∑ ∑ 𝑊𝑊(𝑖𝑖, 𝑗𝑗) ⊕ W′(I, j)
𝑁𝑁𝑗𝑗 =1𝑀𝑀𝑖𝑖 =1
𝑀𝑀 × 𝑁𝑁 (6) The last parameter that we use in this paper is Peak Signal to Noise Ratio (PSNR), which is the influence of imperceptibility of an image.
PSNR = 20 log
256 × 256MSE (7) (a) (b) Fig 2. Structure of network, a) Mask R-CNN [11], b) Region Proposal Network (RPN) [13]
Fig 3. Embedding map formation. (a) (b) (c) Fig 4. a) Input image, b) watermark bits, c) watermarked image B. Results
We test our method for different parameters and compare them with [5]. The maximum PSNR and SSIM that our approach can achieve are 49.1052 and 0.9985 respectively. In comparison with [5], we have higher results. For example, in equal SSIM, we get 43.6043 for PSNR, but Jamali’s PSNR is 42.4306. Results in Table 1 show our method is more robust against different attacks than [5]. Attacks that we test our approach by them are Gaussian noise (GN), JPEG compression (JC), 3×3 median filter (MF), histogram equalization (HE), and salt and pepper noise (S&P). We use three types of GN and JC that show the variance and ratio of compression respectively. Figures 5 show the effect of k α on PSNR and SSIM that values of c p and c c are 1. By increasing k α from 0.01 to 1, the value of PSNR and SSIM will be decreased. One of the results of our method represents in Figure 4. IV. C ONCLUSION
In this paper, we proposed a method to use the neural network to find the strength factor of ROI based blind image watermarking. Each watermark bit embedded 15 times in the COCO dataset for redundancy improvement. Our method is robust against different attacks. In comparison to similar work, our approach has better results and in equal NC, has higher PSNR and SSIM. R EFERENCES [1]
A. Roy, A. K. Maiti, and K. Ghosh, “An HVS Inspired Robust Non-blind Watermarking Scheme in YCbCr Color Space,”
Int. J. Image Graph. , vol. 18, no. 03, p. 1850015, 2018. [2]
F. Ernawan and M. N. Kabir, “A robust image watermarking technique with an optimal DCT-psychovisual threshold,”
IEEE Access , vol. 6, pp. 20464–20480, 2018. [3]
A. Mairgiotis, L. P. Kondi, and Y. Yang, “Dct/dwt blind multiplicative watermarking through student-t distribution,” in
Image Processing (ICIP), 2017 IEEE International Conference on , 2017, pp. 520–524. [4]
C. Dong, J. Li, and Y. Chen, “A DWT-DCT Based Robust Multiple Watermarks for Medical Image,” in
Photonics and Optoelectronics (SOPO), 2012 Symposium on , 2012, pp. 1–4. [5]
M. Jamali, S. Samavi, N. Karimi, S. M. R. Soroushmehr, K. Ward, and K. Najarian, “Robust watermarking in non-ROI of medical images based on DCT-DWT,” in
Engineering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference of the , 2016, pp. 1200–1203. [6]
M. Jamali, M. Bagheri, N. Karimi, and S. Samavi, “Robustness and Imperceptibility Enhancement in Watermarked Images by Color Transformation,” arXiv Prepr. arXiv1911.00772 , 2019. [7]
H. Liu, J. Liu, and M. Zhao, “Visual Saliency Model-Based Image Watermarking with Laplacian Distribution,”
Information , vol. 9, no. 9, p. 239, 2018. [8]
M. Ali and C. W. Ahn, “An optimal image watermarking approach through Cuckoo search algorithm in wavelet domain,”
Int. J. Syst. Assur. Eng. Manag. , vol. 9, no. 3, pp. 602–611, 2018. [9]
A. Rajpal, A. Mishra, and R. Bala, “Multiple scaling factors based semi-blind watermarking of grayscale images using OS-ELM neural network,” in
Signal Processing, Communications and Computing (ICSPCC), 2016 IEEE International Conference on , 2016, pp. 1–6. [10]
W. Zheng et al. , “Robust and high capacity watermarking for image based on DWT-SVD and CNN,” in , 2018, pp. 1233–1237. [11]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in
Computer Vision (ICCV), 2017 IEEE International Conference on , 2017, pp. 2980–2988. [12] “http:// cocodataset.org.”. [13]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in
Advances in neural information processing systems , 2015, pp. 91–99. (a) (b)
Fig 5. a) PSNR (dB), b) SSIM for different values of k. Table 1. Results for different attacks
Attack Proposed [5]
BER NC BER NC GN 0.01
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1 HE
0 1
0 1 S&P
0 10.0128 0.9745