[PDF] CNN Based Adversarial Embedding with Minimum Alteration for Image Steganography

Abstract

Historically, steganographic schemes were designed in a way to preserve image statistics or steganalytic features. Since most of the state-of-the-art steganalytic methods employ a machine learning (ML) based classifier, it is reasonable to consider countering steganalysis by trying to fool the ML classifiers. However, simply applying perturbations on stego images as adversarial examples may lead to the failure of data extraction and introduce unexpected artefacts detectable by other classifiers. In this paper, we present a steganographic scheme with a novel operation called adversarial embedding, which achieves the goal of hiding a stego message while at the same time fooling a convolutional neural network (CNN) based steganalyzer. The proposed method works under the conventional framework of distortion minimization. Adversarial embedding is achieved by adjusting the costs of image element modifications according to the gradients backpropagated from the CNN classifier targeted by the attack. Therefore, modification direction has a higher probability to be the same as the sign of the gradient. In this way, the so called adversarial stego images are generated. Experiments demonstrate that the proposed steganographic scheme is secure against the targeted adversary-unaware steganalyzer. In addition, it deteriorates the performance of other adversary-aware steganalyzers opening the way to a new class of modern steganographic schemes capable to overcome powerful CNN-based steganalysis.

Full PDF

aa r X i v : . [ c s . MM ] M a r CNN Based Adversarial Embedding with MinimumAlteration for Image Steganography

Weixuan Tang, Bin Li*, Shunquan Tan, Mauro Barni, and Jiwu Huang

Abstract

Historically, steganographic schemes were designed in a way to preserve image statistics or steganalytic features. Since mostof the state-of-the-art steganalytic methods employ a machine learning (ML) based classiﬁer, it is reasonable to consider counteringsteganalysis by trying to fool the ML classiﬁers. However, simply applying perturbations on stego images as adversarial examplesmay lead to the failure of data extraction and introduce unexpected artefacts detectable by other classiﬁers. In this paper, we presenta steganographic scheme with a novel operation called adversarial embedding, which achieves the goal of hiding a stego messagewhile at the same time fooling a convolutional neural network (CNN) based steganalyzer. The proposed method works under theconventional framework of distortion minimization. Adversarial embedding is achieved by adjusting the costs of image elementmodiﬁcations according to the gradients backpropagated from the CNN classiﬁer targeted by the attack. Therefore, modiﬁcationdirection has a higher probability to be the same as the sign of the gradient. In this way, the so called adversarial stego imagesare generated. Experiments demonstrate that the proposed steganographic scheme is secure against the targeted adversary-unawaresteganalyzer. In addition, it deteriorates the performance of other adversary-aware steganalyzers opening the way to a new classof modern steganographic schemes capable to overcome powerful CNN-based steganalysis.

Index Terms

Steganography, steganalysis, adversarial machine learning.

I. I

NTRODUCTION

Image steganography is the art and science of concealing covert information within images. It is usually achieved bymodifying image elements, such as pixels or DCT coefﬁcients. On the other side of the game, steganalysis aims to reveal thepresence of secret information by detecting whether there are abnormal artefacts left by data embedding.The developing history of steganography and steganalysis is rich of interesting stories, as they compete with each other andthey beneﬁt and evolve from the competition [1]. The earliest steganographic method was implemented by substituting the leastsigniﬁcant bits of image elements with message bits. The stego artefacts introduced by this method can be effectively detectedby Chi-squared attack [2], or steganalytic features based on ﬁrst-order statistics [3]. In this initial phase of the competition,statistical hypothesis testing or a simple linear classiﬁer such as FLD (Fisher Linear Discriminant) could serve the need ofsteganalysis.The ﬁrst-order statistics can be restored after data embedding, as done in [4]. The abnormal artefacts in the ﬁrst-orderstatistics can also be avoided as in [5], [6]. As a consequence, more powerful steganalytic features based on the second-orderstatistics [7], [8] were proposed. In this period, advanced machine learning (ML) tools, such as SVM (Support Vector Machine),were operated on high-dimensional features (where the dimension is typically several hundreds). These methods were veryeffective in detecting steganographic schemes even if the ﬁrst-order statistics were preserved.Modern steganographic schemes are designed under the framework of distortion minimization [9]. The embedding cost ofchanging each image element is speciﬁed by a cost function, and a coding scheme is employed to convey information byminimizing the distortion, which is computed as the total cost of modiﬁed elements. For example, the schemes in [10]–[15]are well-known for their elegant cost functions. As a counter measure, state-of-the-art steganalytic methods adopt higher-orderstatistics with much higher dimensional features (where the dimension is typically thousands or even more than ten thousands),such as in [16]–[20]. More sophisticated ML methods, such as the ensemble classiﬁer [21], have also been employed.Steganalytic methods based on deep learning [22]–[27] have rapidly gained an increasing attention in recent years. Withoutthe need of designing hand-crafted features, deep convolutional neural networks (CNN) show a promising way in automaticfeature extraction and classiﬁcation for steganalysis. Incorporated some domain knowledge into the network design, such asusing high-pass ﬁlters for pre-processing, outstanding performance can be obtained.The high-dimensional hand-crafted or deep-learned features with the powerful supervised ML schemes present a greatchallenge to steganography. A promising strategy for the steganographer is to use side information which is not available tothe steganalyst, such as using the camera sensor noise during message embedding [28] and the compression noise during JPEGcompression [12]. However, the side information is not always available for all kinds of cover images, especially for those

W. Tang, B. Li, S. Tan, and J. Huang are with Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of MediaSecurity, Shenzhen University, Shenzhen 518060, China (email: [email protected]; [email protected]; [email protected]; [email protected]).M. Barni is with Department of Information Engineering and Mathematics, University of Siena, Siena 53100, Italy (email: [email protected]).W. Tang is also with Sun Yat-sen University, Guangdong 510006, China.*B. Li is the correspondence author. already compressed in JPEG format. As a consequence, better steganographic schemes suitable for more general conditionsare needed.As the dimension of steganalytic features increases, it is difﬁcult for steganograhpy to preserve all statistical features duringdata embedding. This motivates us to ﬁnd a better way to resist steganalysis by countering the ML based classiﬁer. Recentstudies [29], [30] have shown that ML systems are vulnerable to intentional adversarial operations. For example, Chen et al. [31] have shown that the performance of an image forensics detector with a SVM classiﬁer can be greatly degraded by a rathersimple gradient based attack. There is also some research evidence indicating that classiﬁer based on deep learning can beeasily fooled by adversarial examples [32]–[34], which are formed by applying small but intentional perturbations to inputsin order to make the classiﬁcation model yield erroneous outputs. However, applying adversarial perturbations as in [32] onstego images may lead to data extraction failures. The perturbations may also introduce unexpected artefacts detectable byother classiﬁers.The progress in adversarial signal processing [35] inspired us to design a steganographic scheme that is resistant against MLbased steganalyzers. In this paper, we propose a scheme called AMA (Adversarial embedding with Minimum Alternation).Targeted to counter Xu-CNN JPEG steganalyzer [26], we generate a new kind of stego images via adversarial embedding , anoperation that takes into account both the embedding of the stego message and the necessity to fool the targeted steganalyzer.AMA is implemented under the framework of distortion minimization, and based on a baseline steganographic scheme adoptinga conventional embedding mechanism. Speciﬁcally, AMA adapts the cost assignment process by asymmetrically adjusting aportion of embedding costs according to the gradients backpropagated from the deep learning steganalyzer. In order to avoidunnecessary extra modiﬁcations, the amount of image elements with adjustable costs is kept to a minimum. Experimental resultsshow that the adversarial stego images generated by AMA with adversarial embedding can successfully fool the targeted deeplearning steganalyzer, which was trained with several hundreds of thousands of training images. More interestingly, althoughthe adversarial stego images have a higher rate of modiﬁcations, they are less detectable by other advanced hand-crafted featurebased steganalyzers than stego images generated by the baseline steganographic scheme. These results seem to suggest that,guided by necessity to fool the data-driven deep learning steganalyzer, adversarial embedding implicitly preserves the imagestatistics to some extent.The main contributions of our work are as follows:1) A new strategy to fool the ML classiﬁers, which is not based on the attempt to preserve a speciﬁc image statistical model,is proposed. We believe this is a promising way to counter steganalysis.2) A practical steganographic scheme called AMA with adversarial embedding operation is proposed. As opposed toconventional approaches used to cerate adversarial examples in other machine learning domain, adversarial stego imagesgenerated by the AMA scheme are capable of carrying secret information.3) Based on the knowledge available to the steganographer and the steganalyst, different adversarial models are considered,wherein the proposed scheme can achieve state-of-the-art security performance.The rest of the paper is organized as follows. In Section II, we formulate the problem of steganography and steganalysis.We point out the foundation of the proposed steganographic scheme, and differentiate two kinds of adversarial scenarios. Wepresent the idea as well as a practical implementation of the proposed AMA steganographic scheme in Section III. Extensiveexperiments are performed and the results are reported in Section IV to demonstrate the performance of the AMA schemeunder different adversarial conditions when compared to the baseline steganographic method. Conclusions are presented inSection V. II. P

ROBLEM F ORMULATION

In this article, capital letters in bold are used to represent matrices. The corresponding lowercase letters are used formatrix elements. The ﬂourish letters are used for sets. Speciﬁcally, cover and stego images are respectively denoted as C =( c i,j ) H × W and S = ( s i,j ) H × W , where H and W are the height and width of the image. Their corresponding image sets aredenoted as C and S . In order to differentiate the proposed adversarial stego images from conventional stego images, we use Z = ( z i,j ) H × W ∈ Z . Note that Z is a special type of S . A. Practical Evaluation Metrics for Steganographic Security

The fundamental requirement of steganalysis is to differentiate stego images from cover images. When analyzing an image X , the steganalyst must decide between the following two hypotheses: H : X is a cover image. H : X is a stego image. (1)To accomplish this task in a supervised ML setting, the steganalyzer would train a classiﬁer φ C , S with binary output usingtraining data from C and S , and obtain the decision criterion as follows: ( X is a cover image, if φ C , S ( X ) = 0 , X is a stego image, if φ C , S ( X ) = 1 . (2) The trained classiﬁer is called steganalyzer . Two types of errors can occur.

Type I error is the missed detection where stegoimages are misclassiﬁed, and

Type II error is the false alarm where cover images are misclassiﬁed. Their corresponding errorprobabilities are deﬁned as: P φ C , S md = Pr { φ C , S ( S ) = 0 } , (3)and P φ C , S fa = Pr { φ C , S ( C ) = 1 } . (4)Under equal Bayesian prior for cover and stego, the total error rate is P φ C , S e = P φ C , S md + P φ C , S fa . (5)The goal of the steganalyst is to minimize P φ C , S e , while that of the steganographer is to maximize it. B. Steganographer’s Knowledge about Steganalyzer

The steganographer may have different levels of knowledge about φ C , S , such as the classiﬁcation scheme and the trainingdata. In this paper, we will not discuss what is the best strategy the steganographer should take according to the accessibilityof these knowledge. Instead, we assume the gradients of the loss function with respect to the input, which are backpropagatedfrom a ML based steganalyzer φ C , S , are accessible to the steganographer. This is the foundation of the proposed steganographicscheme. In Section III, we will propose a scheme to fool such a steganalyzer with adversarial stego images . We will alsoinvestigate in the experimental part how the adversarial stego images perform under other advanced steganalyzers ( e.g. , φ ′C , S )where the knowledge of these steganalyzers is unavailable. C. Steganalyst’s Knowledge about Adversarial Stego Images

If a steganalyst is unaware of any adversarial operation, he is called adversary-unaware steganalyst . Otherwise, he iscalled adversary-aware steganalyst . The best reaction of an adversary-aware steganalyst is to re-train the classiﬁer withadversarial stego samples to obtain a new steganalyzer φ C , Z , or use other advanced steganalyzers ( e.g., φ ′C , Z ) unknown to thesteganographer. This may present two most challenging cases for a steganographer and we will discuss these scenarios in theexperiments. III. T HE P ROPOSED

AMA S

TEGANOGRAPHIC S CHEME

In this section, we will propose a novel steganographic scheme, which is called AMA, to counter a targeted steganalyzer.First, we will outline the basic idea of the proposed scheme. Then we will discuss two most important operations in the proposedscheme, i.e., adversarial embedding and minimum alteration, in details. Finally, we will give a practical implementation ofAMA.

A. Basic Idea

In the proposed scheme, the image elements are randomly divided into two groups, i.e., a common group containing commonelements, and an adjustable group containing adjustable elements . Data embedding is performed in two phases. In the ﬁrstphase, a portion of the stego message is embedded into the common group by using a conventional steganographic scheme.In the second phase, the remaining part of the stego message is embedded into the adjustable group by using the proposedadversarial embedding scheme. Adjustable elements are modiﬁed in such a way that a targeted steganalyzer would output awrong class label. We use a well-known deep learning based steganalyzer, i.e. , Xu’s CNN [26], as the targeted steganalyzer,since the gradient values of its loss function with respect to the input can be used to guide the modiﬁcation of adjustableelements. Other steganalyzers possessing such a property may also be used. The details will be given in Section III-B. In orderto prevent over-adapted to the targeted steganalyzer and enhance the security performance against other advanced steganalyzers,the number of adjustable elements is minimized, resulting in a minimization problem with constraints. The details will be givenin Section III-C.

B. Adversarial Embedding

Denote y as the ground truth label of X . In steganalysis, we have y ∈ { , } , where indicates a cover and indicates astego. Let L ( X , y ; φ C , S ) be the loss function of a steganalyzer φ C , S . For example, for a deep neural network steganalyzer, thebinary decision could be given as φ C , S ( X ) = ( , if F ( X ) < . , , if F ( X ) ≥ . , (6) where F ( X ) ∈ [0 , is the network output indicating the probability that X is a stego. The corresponding loss function maybe designed in a form of cross entropy as L ( X , y ; φ C , S ) = − y log ( F ( X )) − (1 − y ) log (1 − F ( X )) (7)In [32]–[34], adversarial examples are generated to fool ML models by updating input elements x i,j according to the gradientof the loss function with respect to the input (abbreviated as gradient if it is not speciﬁed otherwise), i.e. , ▽ x i,j L ( X , ˆ y ; φ C , S ) ,by using a targeted label ˆ y . However, it is impossible to directly apply these methods for securing steganography. In fact,modifying the elements of a stego image may lead to the failure of data extraction thus contradicting the aim of steganography.This motivates us to design an embedding method with two objectives of equal importance: performing adversarial operationto combat steganalyzer φ C , S and performing data embedding to carry information. To this end, we propose a method thatwe will call adversarial embedding to generate adversarial stego images under the framework of steganographic distortionminimization [9].In the distortion minimization framework, steganography is formulated as an optimization problem with a payload constraint, i.e. , min S D ( C , S ) , s.t. ψ ( S ) = k, (8)where D ( C , S ) is a function measuring the distortion caused by modifying C to S , and ψ ( S ) represents the message payloadextracted from S (measured in bits). A typical additive distortion function for ternary embedding, such as those used in[11]–[15], is deﬁned as: D ( C , S ) = H X i =1 W X j =1 ρ + i,j δ ( m i,j −

1) + ρ − i,j δ ( m i,j + 1) , (9)where m i,j = s i,j − c i,j is the difference between the cover and the stego elements, δ ( · ) is an indication function: δ ( x ) = ( , x = 0 , , otherwise, (10)and ρ + i,j and ρ − i,j are respectively the cost of increasing and decreasing c i,j by 1. Although different steganographic schemesmay employ different cost functions, a rule of thumb is that large cost values are assigned to elements more likely to introduceabnormal artefacts leading to low probabilities of modiﬁcation, and vice versa. In most schemes, ρ + i,j = ρ − i,j , leading to equalprobabilities of increasing or decreasing c i,j . With the CMD (clustering modiﬁcation direction) strategy [36], [37], the costs ofincreasing or decreasing are asymmetrically updated during embedding in favor to a synchronized direction in neighborhood.In [32], it is observed that when a perturbation signal associated with a targeted label is added to the input, the updatedinput, called adversarial example , is usually misclassiﬁed into the targeted class by the ML classiﬁer. The perturbation signalcan be designed in various ways, including using the gradient of the loss function with respect to the input. Since adding aperturbation with the inverse sign of the gradient has an adversarial effect, the objective of the proposed adversarial embeddingis to modify image elements in such a way that the sign of the modiﬁcation tends to be in accordance with the inverse signof the gradient. To achieve such an objective with a high probability, together with data embedding, we operate under thedistortion minimization framework by deﬁning the embedding costs as follows: ρ + i,j  < ρ − i,j , if ▽ x i,j L ( X , ˆ y ; φ C , S ) < , = ρ − i,j , if ▽ x i,j L ( X , ˆ y ; φ C , S ) = 0 ,> ρ − i,j , if ▽ x i,j L ( X , ˆ y ; φ C , S ) > . (11)Such costs yield asymmetric probabilities of increasing and decreasing the element x i,j , if the gradient is not zero. In this way,data can be embedded into the image elements, and the direction of the modiﬁcation has the effect of inducing the steganalyzer φ C , S to decide for the targeted label ˆ y = 0 . C. Minimum Alteration of Adjustable Elements

With adversarial embedding, the adversarial stego images may effectively evade steganalysis. However, since the costs ofincreasing and decreasing are asymmetric, it increases the number of changed image elements. The reason is that the maximumentropy can only be obtained when the image element has an equal probability of increasing and decreasing. With the payloadconstraint, asymmetric costs lead to a higher change rate when compared to symmetric costs. Although a higher change ratemay not necessarily lead to a worse security performance, we would still like to minimize it by reducing the frequency ofadversarial embedding. This is due to three facts. First, it is sufﬁcient to fool the ML classiﬁer by using only a part of theelements to perform the adversarial operation, as shown in [38]. In fact, it is even unnecessary to perform adversarial embeddingto those stego images which are generated by conventional steganographic schemes but are already incorrectly classiﬁed bythe imperfect steganalyzer. Second, if all elements are used for adversarial embedding, the generated adversarial stego images may be overly adapted to the targeted steganalyzer and may possibly become more detectable by other advanced steganalyzers.We may perform a minimum amount of alteration to prevent introducing other detectable artefacts that can be exploited by anadversary-aware steganalyzer. Third, when the change rate is minimized, the image quality should be preserved better.We propose to divide image elements into two groups i.e. , a common group containing common elements for conventionalsteganographic embedding, and an adjustable group containing adjustable elements for adversarial embedding. The objectiveis that the amount of adjustable elements should be minimized while the targeted steganalyzer should output a wrong classlabel. Mathematically speaking, the problem is formulated as min β, s.t. φ C , S ( Z ) = 0 and ψ ( Z ) = k, (12)where β ∈ [0 , denotes the ratio of the amounts of adjustable elements to all image elements. It is obvious that there is noexplicit solution to such a problem. To solve it efﬁciently, the targeted steganalyzer is employed to numerically search for “justenough” amount of adjustable elements to satisfy the constraints in (12). The details will be described in the next subsection. D. A Practical Implementation of AMA

In this part, we present a practical AMA steganographic scheme. Since JPEG images are widely used and pervasive onthe Internet, we use them as cover. We will use Xu-CNN [26] as the targeted steganalyzer and J-UNIWARD [12] as thebaseline steganographic scheme for conventional data embedding. However, other image formats, steganalyzers, or conventionalembedding schemes, may also be applicable, as indicated in Section III-A. The detailed steps of the proposed scheme aredescribed as follows.1) For a cover image C = ( c i,j ) H × W , use a conventional cost function (such as in J-UNIWARD) to compute the initialembedding costs, i.e. , { ρ + i,j , ρ − i,j } , for the DCT coefﬁcients. Initialize the parameter β = 0 .2) Divide the elements in C into two disjoint groups, i.e. , a common group containing l = [ H × W × (1 − β )] commonelements, and an adjustable group containing l = H × W − l adjustable elements. The positions of these two kinds ofelements can be ﬁxed in advance or randomized with the details of the randomization to be discussed later.3) Embed k = [ k × (1 − β )] bits into the common group using the initial embedding costs computed in Step 1 by applyinga distortion minimization coding scheme, such as STC (syndrome-trellis codes) [39]. The resulting image is denoted as Z c .4) Compute the gradients ▽ z i,j L ( Z c , ˆ y ; φ C , S ) of the steganalyzer using the targeted label ˆ y = 0 . Update the embeddingcosts for the adjustable elements by q + i,j =  ρ + i,j /α, if ▽ z i,j L ( Z c , φ C , S ) < ,ρ + i,j , if ▽ z i,j L ( Z c , φ C , S ) = 0 ,ρ + i,j .α, if ▽ z i,j L ( Z c , φ C , S ) > , (13) q − i,j =  ρ − i,j /α, if ▽ z i,j L ( Z c , φ C , S ) > ,ρ − i,j , if ▽ z i,j L ( Z c , φ C , S ) = 0 ,ρ − i,j .α, if ▽ z i,j L ( Z c , φ C , S ) < , (14)where α is a scaling factor set to 2 in this work. Embed k = k − k bits into the adjustable elements by using theupdated embedding costs computed from (13) and (14) and the same coding scheme used for the common group. Theresultant image is Z .5) Take Z as the input of the steganalyzer φ C , S . If φ C , S ( Z ) = 0 , which means the adversarial stego Z can fool the steganalyzerwith a minimum value of β , use Z as the output and terminate the embedding process. Otherwise, the amount of adjustableelements may not be enough. In this case, update β by β +∆ β , and repeat Step 2 to Step 5 until β = 1 . We use ∆ β = 0 . in this work. If β = 1 and φ C , S ( Z ) = 1 , which corresponds to the failure case of adversarial embedding, we just use aconventional steganographic scheme for embedding and output a conventional stego image.Since the same coding scheme, such as STC, is used both in the adjustable group and the common group, the messagereceiver neither needs to be informed about the value of β , nor needs to know which image elements belong to to the adjustablegroup or the common group. Data is extracted in the same way as the baseline steganographic scheme.As we know, in most existing steganographic schemes, an embedding order of image elements is generated by scramblingthe indexes of image elements, where the scrambling operation is determined by a secret key shared between the sender and thereceiver. The secret key can be ﬁxed for different images, or changed as a session key. In the AMA implementation, the positionsof the common elements and that of adjustable elements can be determined as follows. First, generate an embedding order inthe same way as the baseline steganographic scheme. Then, the common group is formed by the ﬁrst l = [ H × W × (1 − β )] elements according to the embedding order. Finally, the adjustable group is formed by the remaining elements. In other words,the positions of adjustable elements can be ﬁxed or randomized for different images, depending on whether the embeddingorder is ﬁxed or randomized. TABLE IT

HE SECURITY PERFORMANCE ( IN %) AGAINST AN ADVERSARY - UNAWARE STEGANALYZER P fa P md P e P fa P md P e P fa P md P e P fa P md P e P fa P md P e φ C B , S B J-UNIWARD [12] (cid:8) C tstB , S tstB (cid:9) (cid:8) C tstB , Z tstB (cid:9) φ ′C B , S B J-UNIWARD [12] (cid:8) C tstB , S tstB (cid:9) (cid:8) C tstB , Z tstB (cid:9) φ ′′C B , S B J-UNIWARD [12] (cid:8) C tstB , S tstB (cid:9) (cid:8) C tstB , Z tstB (cid:9) HE SECURITY PERFORMANCE ( IN %) AGAINST AN ADVERSARY - AWARE STEGANALYZER P fa P md P e P fa P md P e P fa P md P e P fa P md P e P fa P md P e φ C trnB , S trnB J-UNIWARD [12] (cid:8) C tstB , S tstB (cid:9) φ C trnB , Z trnB Proposed AMA (cid:8) C tstB , Z tstB (cid:9) φ ′C trnB , S trnB J-UNIWARD [12] (cid:8) C tstB , S tstB (cid:9) φ ′C trnB , Z trnB Proposed AMA (cid:8) C tstB , Z tstB (cid:9) φ ′′C trnB , S trnB J-UNIWARD [12] (cid:8) C tstB , S tstB (cid:9) φ ′′C trnB , Z trnB Proposed AMA (cid:8) C tstB , Z tstB (cid:9) IV. E

XPERIMENTS

In order to evaluate the performance of the proposed AMA scheme, we conduct the following experiments.1) We evaluate the performance of AMA in the presence of an adversary-unaware steganalyst who trains his steganalyzerwith conventional stego images. This corresponds to the most favorable case for the steganographer. It will be reportedin Section IV-B2) We evaluate the performance of AMA in the presence of an adversary-aware steganalyst who re-trains his steganalyzerwith adversarial stego images. This corresponds to the most challenging case for the steganographer. It will be reportedin Section IV-C3) We simulate the situation when the knowledge of the steganographer and that of the steganalyst are alternatively updated.To the best of our knowledge, this is the ﬁrst work to investigate iterative adversarial conditions for steganography andsteganalysis. It will be demonstrated in Section IV-D4) We show in Section IV-E why adversarial embedding guided by gradients and minimum alteration are important in theproposed scheme.5) We discuss the role of randomizing the positions of the adjustable elements in Section IV-G.6) We perform some experiments on another image set for further evaluation in Section IV-H.The common settings and notations in the experiments are described in Section IV-A. Some statistical information aboutthe stego image sets is provided in Section IV-F.

A. Settings1) Image set:

The following two cover image sets are respectively used. • Basic500k, denoted by C B . It is obtained by randomly selecting × JPEG images with size larger than 256 × ×

256 regions. The images are further converted to grayscale andre-compressed into JPEG format with pquality factor 75. This dataset has been previously used in [27] to train CNNsteganalyzers. Unless speciﬁed otherwise, the experiments are carried out on this image set. To use the images efﬁcientlyunder different circumstances, C B is randomly split into two disjoint subsets, C B and C B , each with . × images. • JPEG-BOSSBase, denoted by C J . In order to verify the performance of AMA on an image set with distinct differencefrom C B , we generate this set with the resizing operation without any possible double JPEG compression artefacts. It isobtained based on images from the public data set BOSSBase v1.01 [40]. The 512 ×

512 PGM format images areresized to 256 ×

256 with a

Lanczos2 resampling kernel, and then compressed into JPEG format with quality factor 75.The experiments in Section IV-H are carried out on this image set. C J is randomly split into two disjoint subsets, C J and C J , each with images.

2) Steganalyzers:

Three different steganalyzers are used to evaluate the security of the steganographic schemes. The detailsare described as follows. • Xu-CNN steganalyzer [26], denoted as φ . To the best of our knowledge, it is the best performing date-driven JPEG CNNsteganalyzer. The 20-layer CNN steganalyzer was proposed by Xu, and we build the CNN structure and set all trainingparameters as in [26], with the only difference that the batch size is set to 100 during the training stage, with 50 coverimages and their corresponding stego counterparts. The CNN model trained at the -th iteration is used as thesteganalyzer. • GFR steganalyzer [20], denoted as φ ′ . It is based on 17000 histogram features generated with Gabor ﬁlters and an FLDensemble classiﬁer [21]. • DCTR steganalyzer [19], denoted as φ ′′ . It is based on 8000 dimensional DCT residual features and an FLD ensembleclassiﬁer [21].The steganalytic performance is evaluated by the missed detection rate as in (3), the false alarm rate as in (4), and the totalerror rate as in (5) .

3) Steganographic schemes:

We use two steganographic schemes to generate stego images. • J-UNIWARD [12]: It is used as a baseline steganographic scheme. The embedding costs of DCT coefﬁcients are calculatedin the wavelet domain using a Daubechies wavelet ﬁlter bank. The corresponding stego image sets are denoted as S B , S B S J , and S J . • AMA: In the proposed scheme, J-UNIWARD is used to compute the initial embedding costs and perform the conventionalembedding. The steganalyzer φ C B , S B based on Xu-CNN is used as the targeted steganalyzer for adversarial embedding.The corresponding adversarial stego image sets are denoted as Z B , Z B , Z J , and Z J . The scaling parameter used in (13)and (14) is set to α = 2 , where we have tried α ∈ { . , , , , } and found only minor difference in performance.The optimal embedding simulator [9] is employed for both J-UNIWARD and AMA. The Matlab implementation of J-UNIWARD is used. . Our proposed AMA scheme is implemented using TensorFlow with Python interface. The experimentsare run on a NVIDIA Tesla K80 GPU platform. The embedding payload is measured by bits per non-zero cover AC DCTcoefﬁcient (bpnzAC) as in [12], [26], [27]. In Section IV-B and IV-C, we conduct experiments on 0.1, 0.2, 0.3, 0.4, and 0.5bpnzAC. For the rest of the experiments, we use 0.4 bpnzAC since the steganalyzers perform better on higher payloads. B. Performance against an Adversary-unaware Steganalyst

In this part, we study the case where the knowledge of the steganalyzer is exposed to the steganographer, but the steganalystis unaware of the adversarial operation and still use the current steganalyzer. In particular, we assume that the Xu-CNNsteganalyzer φ C B , S B , which has been trained on the image set (cid:8) C B , S B (cid:9) , is available to the steganographer. Note that thesteganographer does not necessarily need to have access to (cid:8) C B , S B (cid:9) given that the steganalyzer φ C B , S B is known. Thesteganographer can use φ C B , S B to generate an adversarial stego set Z B from the cover set C B . We would like to knowhow does the steganalyzer φ C B , S B perform on classifying (cid:8) C tstB , Z tstB (cid:9) when compared to classifying (cid:8) C tstB , S tstB (cid:9) . Theexperimental results are reported in Table I. Note that under the same payload rate, the false alarm rate P fa is the same for (cid:8) C tstB , Z tstB (cid:9) and (cid:8) C tstB , S tstB (cid:9) , due to the fact that the steganalyzer was trained on (cid:8) C B , S B (cid:9) but tested on C tstB , which isshared in (cid:8) C tstB , Z tstB (cid:9) and (cid:8) C tstB , S tstB (cid:9) . However, we can observe that the missed detection rate P md is much higher for Z tstB than for S tstB . These results indicate that the adversarial stego images generated by AMA can effective evade detectionby the targeted steganalyzer.In order to investigate the case where the adversarial stego images are analyzed by steganalyzers other than the targeted one,we conducted experiments by using two advanced steganalyzers, i.e., φ ′C B , S B , and φ ′′C B , S B , to perform the same classiﬁcationtasks. The experimental results reported in Table I show that the performance of these detectors on the adversarial stego imagesare, at least to some extent, worse than those obtained on the stego images generated by J-UNIWARD. Although being designedto fool a targeted steganalyzer, the AMA scheme shows a certain effectiveness also against non-targeted steganalyzers. Wespeculate that this adaptability to other steganalyzers is due to the following facts. 1) The selected data-driven steganalyzer φ C B , S B , trained on an image set containing hundreds of thousands of representative images, is very powerful and have betterdetecting ability compared to other steganalyzers, hence 2) resisting such a powerful steganalyzer may implicitly preserve thestatistics of the cover images, as shown in Section IV-F, and therefore can also weaken other advanced steganalyzers (at leastto some extent). It is downloaded from http://dde.binghamton.edu/download/stego algorithms/

TABLE IIIT

HE SECURITY PERFORMANCE ( IN %) OF THE ITERATIVE GAME WHEN THE STEGANALYZER EVOLVED ALTERNATIVELY BETWEENADVERSARY - UNAWARE AND ADVERSARY - AWARE . Round Testing set Steganalyzer P fa P md P e {C tstB , Z tstB } φ C B , S B φ C trnB , Z trnB {C tstB , ˙ Z tstB } φ C B , Z B φ C trnB , ˙ Z trnB {C tstB , ¨ Z tstB } φ C B , ˙ Z B φ C trnB , ¨ Z trnB {C tstB , ... Z tstB } φ C B , ¨ Z B φ C trnB , ... Z trnB {C tstB , .... Z tstB } φ C B , ... Z B φ C trnB , .... Z trnB C. Performance against an Adversary-aware Steganalyst

In this part, we study the case where the steganalyst is aware of the adversarial embedding operation. As stated in SectionII-C, his best reaction is to re-train the steganalyzers with adversarial stego images. We conducted experiments by dividingthe cover image set C B into C trnB and C tstB , with . × and × images, respectively. The adversarial stego images Z trnB and Z tstB are generated as in Section IV-B, where the steganographer only relies on the steganalyzer φ C B , S B to generateadversarial stego images. Then, we trained the steganalyzers based on (cid:8) C trnB , Z trnB (cid:9) and tested on (cid:8) C tstB , Z tstB (cid:9) . In thisway, the image sets for data embedding ( i.e., C trnB and C tstB ) and that for the training targeted steganalyzer ( i.e., C B ) aredifferent, thus ensuring that AMA does not use any prior knowledge of the image set.The experimental results we obtained are reported in Table II. It can be observed that compared to the targeted steganalyzer,which is easily fooled by the adversarial stego images, a re-trained steganalyzer can better detect the adversarial embeddingoperations. However, compared to the baseline J-UNIWARD scheme, the proposed AMA scheme still achieves a better securityperformance. For example, AMA gets a . total error rate for 0.4 bpnzAC, which is comparable to J-UNIWARD with . for 0.3 bpnzAC. This means that under the same risk level of detection, AMA attains 0.1 bpnzAC more payload. Asalso shown in Table II, when we use the other two non-targeted steganalyzers φ ′ and φ ′′ for detection, higher total error ratesare obtained on (cid:8) C tstB , Z tstB (cid:9) than on (cid:8) C tstB , S tstB (cid:9) , showing, once again, that AMA outperforms the baseline scheme. D. Sequential Iterative Game between Steganographer and Steganalyst

In this part, we study a scenario wherein the steganographer and the steganalyst adjust their strategies iteratively each time byadapting their knowledge about the scheme adopted by the adversary. This process can be simulated by performing experimentssimilarly to Section IV-B and Section IV-C iteratively using the Xu-CNN steganalyzer. To deﬁne the iterations, we make thefollowing assumptions for each round. • The adversary-unaware steganalyst is unaware of the adversarial stego images generated in the current round . For the ﬁrstround, plain stego images generated with the baseline steganographic scheme are used for training. For the subsequentrounds, adversarial stego images, which are generated in the same way as the steganographer in the previous round , areused. The steganalyzer is trained on C B and its stego (or adversary stego) counterpart. • The steganographer sets the targeted steganalyzer to be the same as the adversary-unaware steganalyst in the current round and tries to attack it by generating adversarial stego images from C B . • The adversary-aware steganalyst is aware of the adversarial operation performed in the current round . He trains theclassiﬁer based on C trnB and the adversarial stego counterpart in the current round. • To ease the comparison, the C tstB and its corresponding adversarial stego counterpart are used to evaluate the performancefor both the adversary-unaware steganalyzer and the adversary-aware steganalyzer.Although the iterative process can be endless, we performed ﬁve rounds of iterations to illustrate the dynamic effects. Theadversary-unaware steganalyst uses J-UNIWARD to generate the plain stego image set S in the ﬁrst round. The steganographergenerates the adversarial stego sets Z , ˙ Z , ¨ Z , ... Z , and .... Z from the ﬁrst to the ﬁfth round, respectively. The embedding payloadis set to 0.4 bpnzAC. According to the results shown in Table III, we can make the following observations.1) In the same round, the adversary-unaware steganalyzer has a higher total error rate than the adversary-aware steganalyzer,mainly due to its higher missed detection rate. This implies that the steganographer can always effectively fool the targetedsteganalyzer, while the adversary-aware steganalyst effectively exploits the knowledge about the adversary’s operations. TABLE IVT

HE SECURITY PERFORMANCE ( IN %) WITH DIFFERENT SETTING FOR

AMA

UNDER THE PAYLOAD OF

BPNZAC

Case I Case II ( β = 0 . ) Case II ( β = 0 . ) Case II ( β = 0 . )Steganalyzer Testing Set P fa P md P e P fa P md P e P fa P md P e P fa P md P e φ C B , S B (cid:8) C tstB , Z tstB (cid:9) φ C trnB , Z trnB (cid:8) C tstB , Z tstB (cid:9)

2) As the iteration goes on, although the gap in missed detection rate between the adversary-unaware steganalyzer and theadversary-aware steganalyzer ﬂuctuates ( i.e., i.e., i.e.,

E. Investigation on Two Important Components in AMA

Performing adversarial embedding according to the inverse signs of gradients and using minimum alteration are the twomost important components of the AMA scheme, we then conducted some experiments to investigate the effectiveness of eachcomponent. Both adversary-unaware and adversary-aware CNN steganalyzers are used for the evaluation, and the embeddingpayload is set to 0.4 bpnzAC.

1) Case I: reversing adversarial embedding operation:

In the AMA scheme, the embedding costs of adjustable elementsare asymmetrically adjusted according to the inverse signs of the gradients, as shown in (13) and (14). Now, we use the signsof the gradients, instead of the inverse signs, as in the following equations, to perform a comparative experiment: q + i,j =  ρ + i,j /α, if ▽ z i,j L ( Z c , φ C , S ) > ,ρ + i,j , if ▽ z i,j L ( Z c , φ C , S ) = 0 ,ρ + i,j .α, if ▽ z i,j L ( Z c , φ C , S ) < , (15) q − i,j =  ρ − i,j /α, if ▽ z i,j L ( Z c , φ C , S ) < ,ρ − i,j , if ▽ z i,j L ( Z c , φ C , S ) = 0 ,ρ − i,j .α, if ▽ z i,j L ( Z c , φ C , S ) > . (16)The results are shown in Table IV. Compared with the previous results (see Table I and II), the total error rate of the adversary-unaware steganalyzer drops from 58.5% to 21.6%, and that of the adversary-aware steganalyzer from 25.8% to 19.3%. Thedegraded performance indicates that taking into account the signs of the gradients plays an important role in producing theadversarial effect.

2) Case II: disabling minimum alteration:

In the AMA scheme, the number of adjustable elements is minimized throughiteratively ﬁnding a minimum value of β for (12). In the comparative experiment, we use a ﬁxed value of β for each image,and thus the amount of adjustable elements is the same for all the images. The results we have got for β = 0 . , . , and . arepresented in Table IV. It can be observed that as β increases, the missed detection rate of the adversary-unaware steganalyzerincreases, but the total error rate of the adversary-aware steganalyzer decreases. The results indicate that when increasing thenumber of adjustable elements, it becomes easier to fool the targeted steganalyzer. However, an excess of adversarial operationsmay introduce unnecessary artefacts, leading to easier detection by an adversary-aware steganalyzer. Consequently, it is a betterchoice to use “just enough” amount of adjustable elements by balancing the performance of an adversary-unaware steganalyzerand an adversary-aware steganalyzer. F. Supplementary Statistical Information

To further investigate the proposed AMA scheme, we provide some supplementary statistical information on the adversarialstego images as follows. TABLE VT

HE FREQUENCIES OF OCCURRENCES OF β ( IN %) IN GENERATING STEGO IMAGE SET Z B FOR EACH PAYLOAD . T

HE SUM OF EACH COLUMN IS β fail HE MODIFICATION RATE COMPUTED AS THE CHANGE PER NON - ZERO

AC DCT

COEFFICIENT ( IN %) FOR THE TWO STEGANOGRAPHIC SCHEMES UNDERDIFFERENT PAYLOADS . Steganography 0.1 0.2 0.3 0.4 0.5bpnzAC bpnzAC bpnzAC bpnzAC bpnzAC

J-UNIWARD [12] 1.80 3.97 6.32 8.80 11.37Proposed AMA 1.84 4.04 6.43 8.95 11.57

1) Frequency of adversarial embedding operation:

To investigate the statistics on how many adjustable elements are usedin the AMA scheme, the occurrences of β in generating the . × adversarial stego images Z B are given in Table V.Based on the statistics, we can make the following observations. • For a low payload, such as 0.1 bpnzAC, since the steganalyzer is less effective in detecting plain stego images, adversarialembedding is not necessary for a large portion of the stego images, which corresponds to the case of β = 0 . As thepayload increases, more stego images requires adversarial embedding ( β = 0 ). • A lower failure rate of adversary embedding is obtained for a higher payload (from 7.52% on 0.1 bpnzAC to 0.47%on 0.5 bpnzAC). This is due to the fact that more elements are involved in modiﬁcation as the payload increase. Forinstance, less than 2% elements are used for modiﬁcation for 0.1 bpnzAC, while more than 11% elements are used formodiﬁcation for 0.5 bpnzAC, as shown in Table VI. Note that the failure rate is exactly the same as (1 − P md ) of theadversary-unaware CNN steganalyzer given in Table I. • For all payloads, larger values of β occur less frequently than lower values. However, this phenomenon cannot be takenfor granted since it may be due to the speciﬁc images, the baseline steganographic scheme, the targeted steganalyzer, andthe step ∆ β used to search the minimum β .

2) Modiﬁcation rate:

In Section III-C, we have stated that adversarial embedding would lead to an increasing number ofmodiﬁed image elements due to the asymmetric costs assigned to the adjustable elements. We deﬁne the modiﬁcation rate asthe ratio of the number of changed coefﬁcients to the total amount of non-zero AC DCT coefﬁcients. In Table VI, we showthe averaged modiﬁcation rate for J-UNIWARD and AMA under different payloads on the image set C B . As expected, we canobserve that the modiﬁcation rates for AMA are slightly higher than for J-UNIWARD. Besides, the gap in the modiﬁcation ratebetween J-UNIWARD and AMA widens as the payload increases (0.04%, 0.07%, 0.11%, 0.15%, 0.2% for the ﬁve payloads,respectively). This is due to the fact that more cases of β = 0 occur for a higher payload, as indicated in Table V.

3) Feature distance:

In Section IV-B, we claim that the good performance of AMA against statistical feature-basedsteganalyzers may be due to an implicit ability in preserving some statistics. To verify this statement, we use the MaximumMean Discrepancy (MMD) [41] between the cover image set and the stego image set on the feature space formed by 17000-DGFR features [20]. Since it requires extremely large memories for computing all . × images in C B , we randomly select10000 images. The same scaling factor is used for both J-UNIWARD and AMA. The results are shown in Table VII. It canbe observed that AMA has a lower MMD value than J-UNIWARD under the same payload, indicating it preserves the GFRfeatures better, even though its modiﬁcation rate is higher. G. Discussion on the Role of Randomizing the Positions of Adjustable Elements

In our previous experiments, the positions of adjustable elements are randomized by using different embedding orders fordifferent images. One question is whether there is a difference in security performance between randomized positions and ﬁxed The MMD toolbox can be downloaded from http://dde.binghamton.edu/tomas/mmdToolBox.zip. TABLE VIIT HE MMD ( IN × − ) FOR THE TWO STEGANOGRAPHIC SCHEMES UNDER DIFFERENT PAYLOADS . Steganography 0.1 0.2 0.3 0.4 0.5bpnzAC bpnzAC bpnzAC bpnzAC bpnzAC

J-UNIWARD [12] 3.6 27.6 95.6 239.7 499.1Proposed AMA 2.5 19.2 70.4 189.3 409.8TABLE VIIIT

HE SECURITY PERFORMANCE ( IN %) OF AMA

WITH A FIXED EMBEDDING ORDER AGAINST THE ADVERSARY - UNAWARE STEGANALYZER AND THEADVERSARY - AWARE STEGANALYZER . T

HE TESTING IMAGE SET IS (cid:8) C tstB , Z tstB (cid:9) . P ERFORMANCE COMPARISON WITH THE IMPLEMENTATION USING ARANDOMIZED EMBEDDING ORDER IS SHOWN IN THE PARENTHESIS . Steganalyzer 0.1 0.2 0.3 0.4 0.5bpnzAC bpnzAC bpnzAC bpnzAC bpnzAC φ C B , S B ↓ ↓ ↑ ↑ ↑ φ C trnB , Z trnB ↓ ↑ ↓ ↓ ↓ positions. In order to investigate the role of randomizing the positions of adjustable elements, in the following we report theresults of two comparative experiments.In the ﬁrst experiment, we use a ﬁxed embedding order for different images. As indicated in Section III-D, the ﬁxedembedding order results in the ﬁxed positions of adjustable elements. We adopt the same setting we have used in Section IV-Band IV-C. Adversary-unaware and adversary-aware CNN based steganalyzers are respectively used for detection. The resultswe have got are shown in Table VIII. The improved, or deteriorated, of the performance with respect to the implementationadopting the randomized positions are shown in the parenthesis in the table. It can be observed that AMA with the ﬁxedpositions of adjustable elements and that with the randomized positions of adjustable elements do not have obvious differencein performance against the CNN based steganalyzers.As a second experiment, we use a ﬁxed embedding order and a ﬁxed number of adjustable elements ( β = 0 . ) for each image.The payload is set to 0.4 bpnzAC. The results we have got are given in Table IX. The comparison with the implementationusing a randomized embedding order (Case II with β = 0 . in Table IV), is shown in the parenthesis. It can be observed thatthe performance does not change much for an adversary-unaware steganalyzer, while it degrades greatly for an adversary-awaresteganalyzer. This phenomenon is interesting. Although the ﬁxed positions of adjustable elements are not directly leaked tothe adversary-aware steganalyzer, the experimental evidence shows that the data-driven steganalyzer can automatically learnsuch information. In a similar scenario, when the same key is re-used for data embedding simulation, a CNN based method[42] is highly effective in detecting different stego images with synchronized modiﬁcation locations. The performance dropsgreatly when different keys are used for different images. The phenomenon does not occur for feature based steganalyzers.We speculate that modiﬁcations in the same location may present a chance of “collision attack” from the perspective ofCNN based steganalyzers. The neurons may learn strong activations from the synchronized embedding locations. Since AMAemploys minimum alteration, the collision effect is eliminated, even when a ﬁxed embedding order is used, as the resultsreported in Table VIII show.Based on the previous discussion, in order to improve the security of the stego images, a steganographer may want touse a random embedding order. However, this may require to transmit a secret key from the sender to the receiver. Someprevious works propose to establish a secret channel for sharing this and other kinds of side information, or to embed the sideinformation in the stego media [6]. Another possibility is to compute a robust image hash and use the hash value as a secretkey which can be extracted by the receiver. Since the random embedding order does not play an important role in the securityof AMA, we do not discuss its implementation in this work. H. Performance on JPEG-BOSSBase Image Set

In this part, we utilize the image set JPEG-BOSSBase to further evaluate the performance of AMA. The Xu-CNN steganalyzer φ C B , S B trained on Basic500k is still used as the targeted steganalyzer in the AMA scheme. We use three adversary-awaresteganalyzers to detect AMA, and use J-UNIWARD as the baseline for comparison. The embedding payload is set to 0.4bpnzAC. From the results shown in Table X, we can observe that AMA performs better than J-UNIWARD on JPEG-BOSSBase.The results indicate that the good performance of the proposed AMA scheme does not rely much on a speciﬁc image set.V. C ONCLUSIONS

In this paper we proposed a novel approach to look at the steganographic problem; namely, we proposed to embed the stegomessage while simultaneously taking into account the necessity of countering an advanced CNN-based steganalyzer. Such TABLE IXT

HE SECURITY PERFORMANCE ( IN %) OF AMA

WITH A FIXED EMBEDDING ORDER AND A FIXED NUMBER OF ADJUSTABLE ELEMENTS ( β = 0 . ) AGAINST THE ADVERSARY - UNAWARE STEGANALYZER AND THE ADVERSARY - AWARE STEGANALYZER . T

ASE IMAGE SET UNDER THE PAYLOAD OF

BPNZ AC Steganalyzer Steganography Testing Set P fa P md P e φ C J , S J J-UNIWARD [12] (cid:8) C J , S J (cid:9) φ C J , Z J Proposed AMA (cid:8) C J , Z J (cid:9) φ ′C J , S J J-UNIWARD [12] (cid:8) C J , S J (cid:9) φ ′C J , Z J Proposed AMA (cid:8) C J , Z J (cid:9) φ ′′C J , S J J-UNIWARD [12] (cid:8) C J , S J (cid:9) φ ′′C J , Z J Proposed AMA (cid:8) C J , Z J (cid:9) an aim is achieved by introducing a new adversarial embedding method, which takes both data embedding and adversarialoperation into account. A practical steganographic scheme, AMA, which generates adversarial stego images with minimumalteration, has been illustrated to counter a deep learning based targeted steganalyzer. The extensive experiments we havecarried out permitted us to reach the following conclusions:1) When the targeted steganalyzer is accessible by the steganographer but the steganalyst is unaware of the adversaryoperation, a high missed detection rate can be achieved by AMA to counter the targeted steganalyzer.2) When the steganalyst is aware of the adversarial embedding, and uses adversarial stego images to re-train the steganalyzer,the proposed AMA leads to a higher detection error rate compared to the state-of-the-art baseline steganographic scheme,for both targeted and non-targeted steganalyzers.3) When both the steganographer and the steganalyst iteratively adjust their strategies according to the updated knowledgeabout the other side, the one who makes the last move has a great advantage.Our approach to adversarial embedding shows a promising way to enhance steganographic security, still there are severalunsolved issue to consider. To start with, the proposed AMA scheme uses only the signs of the gradients. It worths investigatingwhether the amplitudes of the gradients can also be helpful. Besides, the foundation of AMA is the accessibility of the gradientsbackpropagated from the steganalyzer. It is worth studying how to counter targeted steganalyzers which do not backpropagategradients to the input, such as those with hand-crafted features. Furthermore, for a complete characterization of the interplaybetween the steganographer and the steganalyst, it would be interesting to resort to a game-theoretic formulation of the problem[35], [43], [44]. R EFERENCES[1] B. Li, J. He, J. Huang, and Y. Q. Shi, “A survey on image steganography and steganalysis,”

Journal of Inf. Hiding and Multimedia Signal Processing ,vol. 2, no. 2, pp. 142–172, Apr. 2011.[2] A. Westfeld and A. Pﬁtzmann, “Attacks on steganographic systems: Breaking the steganographic utilities Ezstego, Jsteg, Steganos, and S-tools-and somelessons learned,” in

Proc. Int. Workshop Inf. Hiding , 1999, pp. 61–75.[3] J. Fridrich and M. Goljan, “On estimation of secret message length in LSB steganography in spatial domain,” in

Proc. SPIE , vol. 5306, Jun. 2004, pp.23–36.[4] N. Provos, “Defending against statistical steganalysis,” in

Proc. 10th Conf. USENIX Secur. Symposium , vol. 10, 2001.[5] J. Mielikainen, “LSB matching revisited,” vol. 13, no. 5, pp. 285–287, 2006.[6] W. Luo, F. Huang, and J. Huang, “Edge adaptive image steganography based on LSB matching revisited,” vol. 5, no. 2, pp. 201–214, Jun. 2010.[7] T. Pevn´y, P. Bas, and J. Fridrich, “Steganalysis by subtractive pixel adjacency matrix,” vol. 5, no. 2, pp. 215–224, Jun. 2010.[8] C. Chen and Y. Q. Shi, “JPEG image steganalysis utilizing both intrablock and interblock correlations,” in , May 2008, pp. 3029–3032.[9] J. Fridrich and T. Filler, “Practical methods for minimizing embedding impact in steganography,” in

Proc. SPIE , vol. 6505, Jan. 2007, p. 650502.[10] T. Pevn´y, T. Filler, and P. Bas, “Using high-dimensional image models to perform highly undetectable steganography,” in

Proc. Int. Workshop Inf. Hiding ,2010, pp. 161–177.[11] V. Holub and J. Fridrich, “Designing steganographic distortion using directional ﬁlters,” in

Proc. IEEE Int. Workshop Inf. Forensics Secur. , Dec. 2012,pp. 234–239.[12] V. Holub, J. Fridrich, and T. Denemark, “Universal distortion function for steganography in an arbitrary domain,”

EURASIP Journal on InformationSecurity , vol. 2014, no. 1, pp. 1–13, Jan. 2014. [13] B. Li, M. Wang, J. Huang, and X. Li, “A new cost function for spatial image steganography,” in Proc. IEEE Int. Conf. Image Process. , Oct. 2014, pp.4026–4210.[14] L. Guo, J. Ni, and Y. Q. Shi, “Uniform embedding for efﬁcient JPEG steganography,” vol. 9, no. 5, pp. 814–825, May 2014.[15] W. Zhou, W. Zhang, and N. Yu, “A new rule for cost reassignment in adaptive steganography,” vol. 12, no. 11, pp. 2654–2667, Nov. 2017.[16] J. Fridrich and J. Kodovsk´y, “Rich models for steganalysis of digital images,” vol. 7, no. 3, pp. 868–882, Jun. 2012.[17] B. Li, Z. Li, S. Zhou, S. Tan, and X. Zhang, “New steganalytic features for spatial image steganography based on derivative ﬁlters and threshold LBPoperator,” vol. 13, no. 5, pp. 1242–1257, May 2018.[18] J. Kodovsk´y and J. Fridrich, “Steganalysis of JPEG images using rich models,” in

Proc. Media Watermarking, Security, and Forensics, SPIE , vol. 8303,Feb. 2012, p. 83030A.[19] V. Holub and J. Fridrich, “Low-complexity features for JPEG steganalysis using undecimated DCT,” vol. 10, no. 2, pp. 219–228, Oct. 2015.[20] X. Song, F. Liu, C. Yang, X. Luo, and Y. Zhang, “Steganalysis of adaptive JPEG steganography using 2D Gabor ﬁlters,” in

Proc. 3rd ACM WorkshopInf. Hiding Multimedia Secur. , 2015, pp. 15–23.[21] J. Kodovsk´y, J. Fridrich, and V. Holub, “Ensemble classiﬁers for steganalysis of digital media,” vol. 7, no. 2, pp. 432–444, Apr. 2012.[22] S. Tan and B. Li, “Stacked convolutional auto-encoders for steganalysis of digital images,” in

Proc. IEEE Asia-Paciﬁc Signal Inf. Process. Assoc. Annu.Summit Conf. , Dec. 2014, pp. 1–4.[23] Y. Qian, J. Dong, W. Wang, and T. Tan, “Deep learning for steganalysis via convolutional neural networks,” in

Proc. SPIE , vol. 9409, Mar. 2015, p.94090J.[24] G. Xu, H. Z. Wu, and Y. Q. Shi, “Structural design of convolutional neural networks for steganalysis,” vol. 23, no. 5, pp. 708–712, May 2016.[25] ——, “Ensemble of CNNs for steganalysis: An empirical study,” in

Proc. 4th ACM Workshop Inf. Hiding Multimedia Secur. , 2016, pp. 103–107.[26] G. Xu, “Deep convolutional neural network to detect J-UNIWARD,” in

Proc. 5th ACM Workshop Inf. Hiding Multimedia Secur. , 2017, pp. 67–73.[27] J. Zeng, S. Tan, B. Li, and J. Huang, “Large-scale JPEG steganalysis using hybrid deep-learning framework,” vol. 13, no. 5, pp. 1200–1214, May 2017.[28] T. Denemark, P. Bas, and J. Fridrich, “Natural steganography in JPEG compressed images,” in

Electron. Imag. , Jan. 2018.[29] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High conﬁdence predictions for unrecognizable images,” in

Proc. IEEEConf. Computer Vision Pattern Recognit. , 2015, pp. 427–436.[30] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in

Proc. Int. Conf.Learning Representations , 2014.[31] Z. Chen, B. Tondi, X. Li, R. Ni, Y. Zhao, and M. Barni, “A gradient-based pixel-domain attack against SVM detection of global image manipulations,”in

Proc. IEEE Int. Workshop Inf. Forensics Secur. , Dec. 2017, pp. 1–6.[32] I.Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in

Proc. Int. Conf. Learning Representations , 2015.[33] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in

Proc. IEEE ConfComputer Vision Pattern Recognit. , 2016, pp. 2574–2582.[34] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial examples,” arXiv:1706.06083 ,2017. [Online]. Available: https://arxiv.org/abs/1706.06083[35] M. Barni and F. P´erez-Gonz´alez, “Coping with the enemy: Advances in adversary-aware signal processing,” in , May 2013, pp. 8682–8686.[36] B. Li, M. Wang, X. Li, S. Tan, and J. Huang, “A strategy of clustering modiﬁcation directions in spatial image steganography,” vol. 10, no. 9, pp.1905–1917, Sep. 2015.[37] T. Denemark and J. Fridrich, “Improving steganographic security by synchronizing the selection channel,” in

Proc. 3rd ACM Workshop Inf. HidingMultimedia Secur. , 2015, pp. 5–14.[38] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in

Proc. IEEEEuropean Symposium Security Privacy , Mar. 2016, pp. 372–387.[39] T. Filler, J. Judas, and J. Fridrich, “Minimizing additive distortion in steganography using syndrome-trellis codes,” vol. 6, no. 1, pp. 920–935, Sep. 2011.[40] P. Bas, T. Filler, and T. Pevn´y, “Break our steganographic system: The ins and outs of organizing BOSS,” in

Proc. Int. Workshop Inf. Hiding , 2011, pp.59–70.[41] T. Pevn´y and J. Fridrich, “Benchmarking for steganography,” in

Proc. 10th International Workshop on Information Hiding , ser. Lecture Notes in ComputerScience, vol. 5284, Barbara, CA, USA, May 2008, pp. 251–267.[42] L. Pibre, J. Pasquet, D. Ienco, and M. Chaumont, “Deep learning is a good steganalysis tool when embedding key is reused for different images, even ifthere is a cover source mismatch,” in