FFeature Bagging for SteganographerIdentification
Hanzhou Wu [email protected]
Abstract.
Traditional steganalysis algorithms focus on detecting theexistence of steganography in a single object. In practice, one may face acomplex scenario where one or some of multiple users also called actors are guilty of using steganography, which is defined as the steganographeridentification problem (SIP). This requires steganalysis experts to designeffective and robust detection algorithms to identify the guilty actor(s).The mainstream works use clustering, ensemble and anomaly detection,where distances in high dimensional space between features of actors aredetermined to find out the outlier(s) corresponding to steganographer(s).However, in high dimensional space, feature points could be sparse suchthat distances between feature points may become relatively similar toeach other, which cannot benefit the detection. Moreover, it is well-knownin machine learning that combining techniques such as boosting and bag-ging can be effective in improving detection performance. This motivatesthe authors in this paper to present a feature bagging approach to SIP.The proposed work merges results from multiple detection sub-models,each of which feature space is randomly sampled from the raw full di-mensional space. We create a new dataset called
ImgNetEase including5108 images downloaded from a social website to mimic the real-worldscenario. We extract PEV-274 features from images, and take nsF5 asthe steganographic algorithm for evaluation. Experiments have shownthat our work improves the detection accuracy significantly on createddataset in most cases, which has shown the superiority and applicability.
Keywords:
Steganographer identification, steganalysis, outlier detec-tion, feature bagging, random subspace.
Steganalysis aims to reveal the use of steganography in seemingly-normal ob-jects. Traditional steganalysis algorithms mainly focus on detecting the existenceof steganography in a single object. This is treated as a binary classification prob-lem, which motivates people to design effective feature extractors [1], [2], [3], [4]and use supervised classifiers such as SVM. Recently, in-depth studies [5], [6] areperformed on moving deep learning to steganalysis.In practice, we may face a complex scenario that multiple network actors senda set of media files while one or some of them are using steganography, which isdefined as steganographer identification problem (SIP) first pointed by Ker et al. a r X i v : . [ c s . MM ] O c t H. Wu [7]. One might use traditional steganalysis algorithms to find stego objects outand then identify the guilty actor(s). However, the guilty actor(s) may be lostdue to a number of false positives because of the between-object difference [8].It requires us to propose efficient pooled steganalysis [9] algorithms.Though the study of SIP is in its infancy, there have been some works re-ported in the literature. Currently, the state-of-the-arts [7], [8], [10], [11] mainlyuse traditional steganalytical features. In their solutions, each actor is repre-sented by a set of feature vectors. The distances between different feature setscorresponding to different actors are determined to measure their similarity.By using hierarchical clustering or local outlier detection, one can collect thesuspicious actor(s) that will be judged as the steganographer(s). These methodscompute distances in the full dimensional space, which, however, may not alwaysperform well since points in high dimensional space could be sparse, implyingthat, distances between feature points may become similar to each other, makingmany normal actors be selected as the guilty ones.It is true that statistical ensemble of multiple learning algorithms can achievebetter prediction performance. To tackle with the aforementioned dimensionalproblem, we introduce a feature bagging approach in this paper. The proposedapproach builds multiple detection sub-models, each of which feature space issampled from the original full dimensional space. By merging results from sub-models, the most suspicious actor(s) are judged as the steganographer(s). Ex-periments show that the proposed work can improve the accuracy of detection,which demonstrates superiority and applicability.The rest of this paper are organized as follows. In Section 2, we introduce theproposed approach. Then, we conduct experiments and analysis for performanceevaluation in Section 3. Finally, we conclude this paper in Section 4.
We will consider images as objects. Mathematically, let A = { a , a , ..., a n } , n ≥ , and S ( a i ) = { I ( i )1 , I ( i )2 , ..., I ( i ) m } , m ≥ , respectively represent actors and theimages held by actor a i . A detector computes the preprocessed feature vectorsfor each a i , i.e., F ( a i ) = { f ( i )1 , f ( i )2 , ..., f ( i ) m } . All F ( a i ) (1 ≤ i ≤ n ) are divided todisjoint sets with an identical size, i.e., P ( a i ) = p (cid:91) j =1 P j ( a i ) , (1 ≤ i ≤ n ) , (1)where m = p · q and P j ( a i ) = { f ( i ) jq − q +1 , f ( i ) jq − q +2 , ..., f ( i ) jq } .Accordingly, each a i (1 ≤ i ≤ n ) can be represented by p set of featurevectors. We call the p sets as “ p points”. Thus, we can collect a total of p · n points, each of which belongs to one of the n actors. It is naturally assumedthat, distances between an abnormal point and a normal point should be largerthan that between two normal points. In other words, normal points are denselydistributed while abnormal ones are sparsely distributed. Thus, we can utilize eature Bagging for Steganographer Identification 3 anomaly detection for identification, but requiring a well-designed distance mea-sure.The maximum mean discrepancy (MMD) [12] has been empirically shownto be quite effective for distance measurement. Given observations X = { x i } | X | i =1 and Y = { y i } | Y | i =1 , which are i.i.d. drawn from p ( x ) and q ( y ) defined on R d , let F be a class of functions f : R d (cid:55)→ R , the MMD and its empirical estimate are:MMD[ F , p, q ] = sup f ∈F E x ∼ p ( x ) f ( x ) − E y ∼ q ( y ) f ( y ) , (2)MMD[ F , X, Y ] = sup f ∈F | X | (cid:88) x ∈ X f ( x ) − | Y | (cid:88) y ∈ Y f ( y ) . (3)Usually, F is selected as a unit ball in a universal RKHS H defined on com-pact metric space R d with kernel k ( · , · ) and feature mapping φ ( · ). The Gaussianand Laplacian kernels are universal. It is proven that (see Lemma 4 in [13]),MMD [ F , p, q ] = (cid:13)(cid:13) E x ∼ p ( x ) φ ( x ) − E y ∼ q ( y ) φ ( y ) (cid:13)(cid:13) H . (4)An unbiased estimate of MMD is:MMD[ F , X, Y ] = | X | − | X | (cid:88) i (cid:54) = j h [ i, j ] / , (5)where | X | = | Y | is assumed and h [ i, j ] = k ( x i , x j ) + k ( y i , y j ) − k ( x i , y j ) − k ( x j , y i ) . (6)For any two points, we use the unbiased estimate of MMD to measure theirdistance, which has been used in prior arts. However, it is noted that, when apoint has only one feature vector, we cannot use MMD since its value alwaysequals zero. In this case, one may use Euclidean metric, i.e., d ( x , y ) = || x − y || , or other metrics. Therefore, by using anomaly detection, a ranking listfor the pn points is determined according to their anomaly scores. We use pn triples { ( u i , v i , w i ) } pni =1 to denote the sorted information, where u ≥ u ≥ ... ≥ u pn represent the anomaly scores. v i denotes the corresponding actor and w i isthe point index, namely, we have P w i ( v i ) ∈ P ( v i ). For each actor a i , we candetermine a fusion score below: s ( a i ) = pn (cid:88) j =1 ( pn + 1 − j ) · δ ( v j , a i ) p , (1 ≤ i ≤ n ) , (7)where δ ( x, y ) = 1 if x = y , otherwise δ ( x, y ) = 0. By sorting the fusion scores,we can generate the final ranking list, where the actor with the largest score willbe the most suspicious.In this way, we can construct a single anomaly detection system operatedon the full feature space, as shown in Algorithm 1 . Feature preprocessing is
H. Wu
Algorithm 1
Single anomaly detection approach for SIP
Input: A = { a , ..., a n } , S ( a i ) = { I ( i )1 , ..., I ( i ) m } , i ∈ [1 , n ] , p. Output:
A ranking list r .1: Extract feature vectors and preprocess them2: Generate disjoint feature sets with Eq. (1)3: Apply outlier detection algorithm A to np points4: Determine { ( u i , v i , w i ) } pni =1 and apply Eq. (7)5: Sort { s ( a i ) } ni =1 and return a ranking list r Algorithm 2
Feature bagging approach for SIP
Input: A = { a , ..., a n } , S ( a i ) = { I ( i )1 , ..., I ( i ) m } , i ∈ [1 , n ] , p. Output:
A ranking list r F .1: Extract feature vectors and preprocess them2: Generate disjoint feature sets with Eq. (1)3: for i = 1 → T do
4: Produce feature sets with dimension d i ∈ [ H/ , H − A i to pn “new” points6: Determine { ( u i , v i , w i ) } pni =1 and apply Eq. (7)7: Sort { s ( a i ) } ni =1 and collect a ranking list r i end for
9: Determine the final fusion scores with Eq. (8)10: Sort { s F ( a ) , ..., s F ( a n ) } and return a final ranking list r F necessary to guarantee accuracy [11]. Feature normalization is a good choice andother preprocessing methods may be suitable as well such as principal compo-nent transformation. By normalization, each feature component has zero meanand unit variance. The preprocessing enables the distance measure to be moremeaningful and not significantly affected by noisy components.A kernel function is required when to use MMD. Ker et al. [11] have examinedmultiple kernels such as linear kernel, Gaussian kernel and the centroid ‘kernel’.It is believed that, alternative kernels have advantages against certain batchembedding strategies. From the point of computational complexity, both linearkernel and the centroid kernel are desirable. In default, we recommend the linearkernel, i.e., k ( x , y ) = x · y . It is proved that, the centroid ‘kernel’ approximatesthe true linear MMD for large size of samples (see Appendix in [11]).Due to the sparse nature in the high dimensional space [14], the distancesamong points may become similar which cannot benefit anomaly detection. Theproposed approach uses feature bagging to deal with this problem and improvethe accuracy of detection, where the dimension of feature vectors is reduced.However, the number of feature vectors are unchanged because we believe that re-ducing the number of feature vectors could reduce “signal-to-noise ratio”, whichcannot benefit detection. Actually, we conducted experiments and found that,reducing the number of feature vectors reduces detection performance. However,we admit that, the decrease of accuracy should be also affected by the imagediversity and other potential factors. eature Bagging for Steganographer Identification 5 Mathematically, the proposed work will build T sub-models M = {M , M , ..., M T } , whose feature dimension vector is denoted by d = ( d , d , ..., d T ). Each d i (1 ≤ i ≤ T ) , is chosen from the range [ H/ , H − H is the dimensionof the raw full feature space. It is possible that d i = d j for some i (cid:54) = j . Each M i (1 ≤ i ≤ T ) corresponds to a single anomaly detection system similar to Algorithm 1 , where a ranking list r i can be collected. The only difference isthat, M i uses the d i -D random subspace of the original H -D space. By furtherprocessing { r , r , ..., r T } , the final fusion score for each a i (1 ≤ i ≤ n ) can begenerated as follows: s F ( a i ) = T (cid:88) j =1 n + 1 − (cid:80) nk =1 [ k · δ ( r ( j ) k , a i )] T , (8)where r j = ( r ( j )1 , ..., r ( j ) n ), and r ( j ) k means the actor with the k -th largest anomalyscore. Namely, for r j , r ( j )1 is the most suspicious and r ( j ) n is the least suspicious.By sorting { s F ( a ) , ..., s F ( a n ) } , we can generate the final ranking list, where theactor with the largest score will be the most suspicious, and the smallest scorecorresponds to the least suspicious. Algorithm 2 shows the procedure.
In this section, we will conduct experiments for evaluation.
Database:
We take JPEG images as the objects held by actors. To mimicthe real-world scenario, we use a web crawler to download images from a Chineseopen-leading social network site
NetEase . The resultant ImgNetEase databasecontains 5108 images, among which around 90% are with a quality factor (QF)close to 90. The average size is close to 1120 (height) × Embedding Algorithm:
Prior arts [10], [11] take F5, nsF5, JPHide&Seek,OutGuess and StegHide as the steganographic algorithms to be tested. We heresincerely refer a reader to [10], [11] for the brief introduction of these stegano-graphic algorithms. It has been shown that, among these algorithms, nsF5 is themost secure. For simplicity, we take nsF5 as the embedding tool in this paper.
Steganalysis Features:
In our experiments, each image will be representedby a 274-D feature vector called PEV-274 [2], designed for JPEG steganalysisand previously shown to be effective against nsF5. PEV-274 was used in [7], [8],[11]. For fair comparison, we take PEV-274 as the feature extractor. Notice that,nsF5 is detectable by modern steganalysis features.
Embedding Strategy:
A guilty actor should choose how to divide a messageto multiple pieces, each of which is carried by a selected image. It poses newoptimization problem, which, however, is not the main interest of this paper.For simplicity, we choose the even strategy [15], i.e., all cover images will carry Fig. 1.
Performance comparison between Ker et al. ’s method and its improved versionusing feature bagging with different QFs: (a) QF = 70, (b) QF = 75, (c) QF = 80, (d)QF = 85 and (e) QF = 90. the same payload regardless of their secure capacity. Notice that, there has nostrategy that is proven to be theoretically optimal. eature Bagging for Steganographer Identification 7
Outlier Detection:
The local outlier factor (LOF) [16] will be used foranomaly detection. In LOF, unless mentioned, an integer k specifying the numberof nearest neighbors of a point is set as 10. The MMD is chosen as the distancemeasure in case p (cid:54) = m , and Euclidean distance for p = m . We preprocess featurevectors by normalization, and use linear kernel for MMD.We use T = 16 sub-models for feature bagging. All raw images in ImgNetEase are cropped from their central regions to create 5 new image datasets, where im-ages are sized 512 ×
512 and the QFs are 70, 75, 80, 85 and 90, respectively. Itsimplifies steganalysis since steganalysis features are sensitive to different quan-tisation matrices [11], [17]. We denote the datasets by SetCover-70, SetCover-75, SetCover-80, SetCover-85, and SetCover-90. For each dataset, we apply thensF5 simulator with 5 data-embedding rates, resulting in 5 stego datasets. Thedata-embedding rates are 0.1, 0.15, 0.2, 0.25, 0.3 bits per non-zero coefficient(bpnc), e.g., for SetCover-70, we generate 5 datasets, denoted by SetStego-70-0.1, SetStego-70-0.15, SetStego-70-0.2, SetStego-70-0.25, and SetStego-70-0.3.In each experiment, we take n = 50 and m = 100. Exactly one guilty actor issimulated by using nsF5. For each combination of parameters, each experimentis repeated 100 times with a random selection of the index number of the guiltyactor. We use the average rank of the guilty actor as the metric to reflect howwell the guilty actor is identified. We take SetCover-70 and SetStego-70-0.1 forbetter explanation. We randomly choose 5000 images images from SetCover-70, and randomly divide them to 50 groups, each of which belongs to an actor a i (1 ≤ i ≤ ≤ g ≤
50, and replacethe cover images held by a g with the corresponding stego images in SetStego-70-0.1. By extracting steganalysis features and applying outlier detection, we rankall actors according to their anomaly scores. Afterward, by repeating the processwith 100 times, we can determine the average rank of the guilty actor.Fig. 1 demonstrates the performance comparison between Ker et al. ’s method[8], [11] and its improved version using feature bagging with different QFs. Wesimulate Ker et al. ’s method [8], [11] with the above-mentioned configurations.We use p = 1 as we found 1 < p < m cannot achieve better improvement whenfeature bagging is applied, which may be due to multiple factors such as featuresensitivity, image diversity and reduction of ‘signal-to-noise ratio’.It is observed from Fig. 1 that, better detection performance can be achievedwith a smaller QF no matter feature bagging is applied or not. This follows theempirical result in steganalysis. When QF = 90, the average ranks of the guiltydue to different embedding rates are all close to 25, which corresponds to randomguessing. Moreover, for different QFs, when the embedding rate is relatively low(e.g., 0.1 bpnc), the detection performance also corresponds to random guess-ing. It has indicated the difficulty of steganalysis at low data embedding rates.It can be also seen that, with feature bagging, the detection performance canbe further improved in most cases, which has shown the potential of featurebagging. Indeed, we can find that, in some cases, feature bagging provides worseperformance, which is normal as we did not optimize the feature selection. http://dde.binghamton.edu/download/nsf5simulator/ H. Wu Fig. 2.
Detection performance using the Euclidean distance ( p = m ). Fig. 1 (e) has shown that Ker et al. ’s method and the method using featurebagging are equivalent to random guessing with the corresponding parameters.One might think that it is mainly due to the steganalysis features (PEV-274).However, we point that, it could be majorly due to the MMD distance sincewe find that, surprisingly, replacing the MMD distance with Euclidean distance(where p = m is required) results in effective detection performance, whichcan be observed from Fig. 2. As shown in Fig. 2, feature bagging still has thepotential to improve the performance. And, though the Euclidean distance doesnot outperform the MMD distance in case QF = 70, the former provides effectiveand better performance in case QF = 90. It indicates that, regardless of thesteganalysis features, a well-designed distance measure is required for achievingsuperior detection performance, which should be a core topic for SIP. In this paper, we present a simple but effective feature bagging approach for theSIP. The proposed work combines the detection results from sub-models, eachof which feature space is randomly sampled from the raw full dimensional space.Experimental results show that our work has the ability to improve the detectionperformance in most cases, which has demonstrated the superiority of our work.Form the viewpoint of performance optimization, there is still room for improve-ment. For example, one may design specific feature selection algorithm, rather eature Bagging for Steganographer Identification 9 than random selection, for choosing efficient feature components for detection.Designing effective steganalysis features is also quite necessary. Our future focuswill be the steganalysis features, and the design of distance measure betweenfeature sets.
References
1. Y. Shi, C. Chen, and W. Chen. A markov process based approach to effective at-tacking JPEG steganography.
Proc. IH , pp. 249-264, 2006.2. T. Pevny, and J. Fridrich. Merging markov and DCT features for multiclass JPEGsteganalysis.
Proc. SPIE , pp. 650503.1-14, 2007.3. T. Pevny, P. Bas, and J. Fridrich. Steganalysis by subtractive pixel adjacency matrix.
IEEE TIFS , vol. 5, no. 2, pp. 215-224, 2010.4. J. Fridrich, and J. Kodovsky. Rich models for steganalysis of digital images.
IEEETIFS , vol. 7, no. 3, pp. 868-882, 2012.5. G. Xu, H. Wu, and Y. Shi. Structural design of convolutional neural networks forsteganalysis.
IEEE Signal Process. Lett. , vol. 23, no. 5, pp. 708-712, 2016.6. G. Xu, H. Wu, and Y. Shi. Ensemble of CNNs for steganalysis: an empirical study.
Proc. ACM IH&MMSec , pp. 103-107, 2016.7. A. Ker, and T. Pevny. A new paradigm for steganalysis via clustering.
Proc. SPIE ,pp. 78800U.1-14, 2011.8. A. Ker, and T. Pevny. Identifying a steganographer in realistic and heterogeneousdata sets.
Proc. SPIE , pp. 83030N.1-13, 2012.9. A. Ker. Batch steganography and pooled steganalysis.
Proc. IH , pp. 265-281, 2006.10. F. Li, K. Wu, J. Lei, M. Wen, Z. Bi, and C. Gu. Steganalysis over large-scale socialnetworks with high-order joint features and clustering ensembles.
IEEE TIFS , vol.11, no. 2, pp. 344-357, 2016.11. A. Ker, and T. Pevny. The steganographer is the outlier: realistic large-scale ste-ganalysis.
IEEE TIFS , vol. 9, no. 9, pp. 1424-1435, 2014.12. A. Gretton, K. Borgwardt, and M. Rasch. A kernel method for the two-sample-problem.
Proc. NIPS , pp. 512-520, 2007.13. A. Gretton, K. Borgwardt, M. Rasch, B. Scholkopf, and A. Smola. A kernel two-sample test.
J. Machine Learning Research , vol. 13, no. 3, pp. 723-773, 2012.14. C. Aggarwal, and P. Yu. Outlier detection for high dimensional data.
Proc. ACMSigmod Record , vol. 30, no.2, pp. 37-46, 2001.15. A. Ker, and T. Pevny. Batch steganography in the real world.
Proc. ACMIH&MMSec , pp. 1-10, 2012.16. M. Breunig, H. Kriegel, R. Ng, and J. Sander. LOF: identifying density-based localoutliers.
Proc. ACM Sigmod Record , vol. 29, no. 2, pp. 93-104, 2000.17. T. Pevny, and J. Fridrich. Multiclass detector of current steganographic methodsof JPEG format.