Camera Fingerprint Extraction via Spatial Domain Averaged Frames
IIEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 1
Camera Fingerprint Extraction via Spatial DomainAveraged Frames
Samet Taspinar, Manoranjan Mohanty, and Nasir Memon
Abstract —Photo Response Non-Uniformity (PRNU) basedcamera attribution is an effective method to determine thesource camera of visual media (an image or a video). To applythis method, images or videos need to be obtained from acamera to create a “camera fingerprint” which then can becompared against the PRNU of the query media whose origin isunder question. The fingerprint extraction process can be timeconsuming when a large number of video frames or imageshave to be denoised. This may need to be done when theindividual images have been subjected to high compression orother geometric processing such as video stabilization. This paperinvestigates a simple, yet effective and efficient technique to createa camera fingerprint when so many still images need to bedenoised. The technique utilizes Spatial Domain Averaged (SDA)frames. An SDA-frame is the arithmetic mean of multiple stillimages. When it is used for fingerprint extraction, the number ofdenoising operations can be significantly decreased with little orno performance loss. Experimental results show that the proposedmethod can work more than times faster than conventionalmethods while providing similar matching results. Index Terms —PRNU, video forensics, camera fingerprintextraction, image forensics.
I. I
NTRODUCTION
PRNU-based source camera attribution is a well-studiedand successful method in media forensics for finding thesource camera of an anonymous image or video [1]. Themethod is based on the unique Photo Response Non Uni-formity (PRNU) noise of a camera sensor array stemmingfrom manufacturing imperfections. This PRNU noise can actas a camera fingerprint. The PRNU approach is often used intwo scenarios: camera verification and camera identification.Camera verification aims to establish if a given query imageor a video is taken by a suspect camera. This is done bycorrelating the noise estimated from the query image or videowith the fingerprint of the camera usually is computed bytaking pictures from the camera under controlled conditions. Incamera identification, the potential source camera of the queryimage or video is determined from a large database of camerafingerprints. One can view camera identification as essentiallythe same as performing n camera verification tasks where n isthe number of camera fingerprints in the database. However,when performing identification, it is assumed that the camerafingerprints are pre-computed.In both verification and identification, it is often the case thatthere is no camera available to create fingerprints under con- Samet Taspinar (email: [email protected]) and Manoranjan Mohanty email:[email protected]) are with Center for Cyber Security, New YorkUniversity Abu Dhabi, UAE. Nasir Memon (email: [email protected]) is withDepartment of Computer Science and Engineering, New York University, NewYork, USA. trolled conditions. Rather, camera fingerprints are estimatedfrom a set of publicly available media assumed to be from thesame camera. Such media can have a very diverse range ofquality and content and often lacks metadata.For efficient fingerprint matching in large databases, variousapproaches have been proposed. Fridrich et al. [2] proposedthe use of fingerprint digests in which a subset of fingerprintelements having the highest sensitivity are used instead of theentire fingerprint. Bayram et al. [3] introduced binarizationwhere each fingerprint element is represented by a singlebit. Valsesia et al. [4] proposed the idea of applying randomprojections to reduce fingerprint dimension. Bayram et. al. [5]introduced group testing via composite fingerprint that focuseson decreasing the number of correlations rather than decreas-ing the size (storage) of a fingerprint. Recently, Taspinar et al.[6] proposed a hybrid approach that utilizes both decreasingthe size of a fingerprint and the number of correlations. Allthese methods were designed and tested for images, howeverthey can also be used for videos.Although the image-centric PRNU-based method can beextended to video [7]–[9], source camera attribution with videopresents a number of new challenges. First, a video frame ismuch more compressed than a typical image. Therefore, thePRNU signal extracted from a video frame is of significantlylower quality than one obtained from an image. As a result,a larger number of video frames are required to compute thefingerprint. In fact, Chuang et. al. [7] found that it is best touse all the frames instead of using only the I- or P-framesto compute a fingerprint. Using a large number of framescan introduce significant computation overhead. For example,computing a fingerprint from I-frames of a one-minute HDvideo requires one to two minutes, whereas to minutesis required if all frames are used.In the case of camera identification, the amount of compu-tation can be prohibitive in practical scenarios. For example,for computing fingerprints from a thousand one-minute FullHD videos (using all frames) using a PC may takemore than days. Clearly, with billions of media objectsuploaded every day on the Internet, large scale camera sourceidentification becomes quickly infeasible. Although camerafingerprints stored in a database may have to be computedjust once by a system, computing a fingerprint estimate atrun-time from a query video can be prohibitive when facedwith a reasonable number of query videos presented to thecamera identification system in a day.Besides source camera identification, digital stabilizationoperations performed within modern cameras also present asignificant challenge for PRNU-based source camera verifi- a r X i v : . [ c s . MM ] S e p EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2 cation for video [8], [10], [11]. Video stabilization results insensor-pixel mis-alignments between individual frames of thevideo as the geometric transformations performed to compen-sate for camera motion and spatially align each frame aredifferent. An accurate camera fingerprint cannot be obtainedusing mis-aligned frames as is done with non-stabilized videoeven if video quality is very high. Although there are somepreliminary methods that address source camera verificationfor stabilized video, [8], [10], these methods are either limitedin scope or have low performance (low true positive rate)and high computation overhead. An alternate approach toaddress the stabilization issue for a fairly long video (at leasta couple of minutes) [12] is to use a large number of framesfor computing the fingerprint. The idea being that with alarge number of frames, there will be sufficient number ofaligned pixels at each spatial location that can result in thecomputation of an accurate fingerprint. As discussed above,this approach however, can again introduce high computationoverhead unsuitable for practical use.As a third example, modern devices such as smartphonescapture different types of media with different resolutions. Forexample, most cameras don’t use the full sensor resolutionwhen capturing a video and downsize the sensor output to alower resolution by proprietary and often unknown in-cameraprocessing techniques. For such a challenging task PRNUbased source camera matching may often fail if only I-framesare used.This paper proposes a computationally efficient way tocompute a camera fingerprint from a large number of mediaobjects, such as individual frames of a video or a largenumber highly compressed images taken from a social mediaplatform. In contrast to the two-step conventional fingerprintcomputation method (which first estimates PRNU noise fromeach frame using a denoising filter and then averages severalestimated individual PRNU noise estimates to get a reliablefingerprint estimate), the proposed method uses a three step ap-proach: frame averaging, denoising, and noise averaging. Theframe averaging step gets the arithmetic mean of the frames inspatial domain, resulting in a
Spatial Domain Averaged frame(SDA-frame) (Figure 2). Then, in the second step each SDA-frame is denoised, and an averaging of the estimated PRNUnoise is done to arrive at the final fingerprint estimate. Thegoal here is to minimize the number of denoising operations(as denoising is most expensive step), and also get rid of scenedependent noise by averaging multiple frames. Experimentswith VISION dataset [13] and NYUAD-MMD [14] show thatthe proposed method provides significant speed up in comput-ing accurate fingerprints. It achieves significantly higher truepositive rate than a fingerprint computed by I-frames only andmuch lower computation cost than a fingerprint obtained fromall available frames while yielding similar performance.The rest of the paper has been organized as follows.Section II summarizes the PRNU-based method and providesan overview of how digital video stabilization works. Sec-tion III explains the proposed fingerprint extraction methodusing SDA-frames as well as an analysis comparing it withthe conventional approach. The insights obtained from theanalysis are experimentally validated in Section IV. Section V examines applications for which SDA-frames based techniquecan be used and reports the improvement that can be achievedusing an SDA-based method for those cases. Section VIsection provides a discussion on future work and concludesthe paper.II. B
ACKGROUND AND R ELATED W ORK
In this section, we provide a brief review of PRNU-basedsource camera attribution and video stabilization.
A. PRNU-based Source Camera Attribution
PRNU-based camera attribution is established on the factthat the output of the camera sensor, I , can be modeled as I = I (0) + I (0) K + ψ (1)where I (0) is the noise-free still image, K is the PRNU noise,and ψ is the combination of additional noise, such as readoutnoise, dark current, shot noise, content-related noise, andquantization noise. The multiplicative PRNU noise pattern, K , is unique for each camera and can be used as a camerafingerprint which enables the attribution of visual media to itssource camera. Using a denoising filter F (such as a Waveletfilter) on a set of images (or video frames) of a camera, wecan estimate the camera fingerprint by first getting the noiseresidual, W k , (i.e., the estimated PRNU) of the k th imageas W k = I k − ˆ I (0) k , ˆ I (0) k = F ( I k ) , and then averaging thenoise residuals of all the images. For determining if a specificcamera has taken a given query image, we first obtain thenoise residual of the query image using F and then correlatethe noise residual with the camera fingerprint estimate.For images, the PRNU-based method has been well studied.Following the seminal work in [1], much research has beendone to improve the scheme [15]–[19], and also make cameraidentification effective in practical situations [2], [3], [5], [6],[20]. Researchers have also studied the effectiveness of thePRNU-based method by proposing various counter forensicsand anti-counter-forensics methods [21], [22] It has alsoshown that the PRNU method can withstand a multitude ofimage processing operations, such as cropping, scaling [23],compression [24], [25], blurring [24], and even printing andscanning [26].In contrast, there has been lesser work dedicated to PRNU-based camera attribution from a video [27]. Mo Chen etal. [28] first extended PRNU-based approach to camcordervideos. They used Normalized Cross-Correlation (NCC) tocorrelate fingerprints calculated from two videos, as the videosmay be subject to translation shift, e.g., due to letter-boxing.To compensate for the blockiness artifacts introduced by heavycompression (such as MPEG-x and H26-x compression), theydiscard the boundary pixels of a block (e.g., a JPEG block).In [29], McCloskey proposed a confidence weighting schemethat can improve PRNU estimation from a video by mini-mizing the contribution from regions of the scene that arelikely to distort PRNU noise (e.g., excluding high-frequencycontent). Chuang et al. [7] studied PRNU-based source cameraidentification problem with a focus on smart-phone cameras.Since smart-phones are subject to high compression, they EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 3 considered only I-frames for fingerprint calculation and corre-lation. Chen et al. [9] proposed a method to find PRNU noisefrom wireless streaming videos, which are subject to blockingand blurring. In their approach, they divided a video frameinto multiple blocks and did not consider the blocks havingsignificant blocking or blurring artifacts. Chaung et al. [7]showed that the best possible fingerprint could be computedwhen all the frames are considered (instead of using only the I-or P-frames). However, to the best of our knowledge, efficientcomputation of fingerprint from a given video is a relativelyunexplored area.
B. Affine Transformation in Video Stabilization
Fig. 1: Video Stabilization Pipeline. This figure is a modifiedversion of a figure that appeared in [30].An out-of-camera digital video stabilization process con-tains three major stages: camera motion estimation, motionsmoothing, and motion correction (Figure 1) [31] [30]. Inthe motion estimation step, the global inter-frame motionbetween adjacent frames of a non-stabilized video is modeledfrom the optical flow vectors of the frames using an affinetransformation. In the motion smoothing step, unintentionaltranslations, rotations, shearing, are filtered out from the globalmotion vectors using a low pass filter. Finally, in the motioncorrection step, stabilized video is created by shifting, rotating,shearing, or zooming frames according to the parameters inthe filtered motion vector. Since each video frames can usedifferent parameters, pixels can be misaligned with the sensorarray. For example, one frame can be rotated with an angle -1degree while another by 0.5 degrees.Digital video stabilization presents a big challenge forPRNU-based camera attribution. The frame specific affinetransformations described above make the PRNU methodineffective as there is misalignment between frames. Thebrute-force methods [10], [22] proposed to address the sta-bilization issue have had limited success and resulted inlow performance. These brute-force methods try to overcomethe desynchronization issue by first finding the stabilizationparameters through an exhaustive search and then performingthe corresponding inverse affine transformation. Such methods,therefore, have very high computation overhead. Recently,Mandelli et al. [11] improved over brute-force approaches by using a best-fit reference frame in the parameter searchingprocess rather than using the first frame of the given video.The best-fit reference frame is obtained by looking for a framethat matches with the largest number of frames. Their approachalso has high computation overhead.III. S
PATIAL D OMAIN A VERAGING
As mentioned in the introduction, this paper proposes spatialdomain averaging for computing camera fingerprints, whichreduces the number of denoising operations when many visualobjects are available. In the proposed method, efficient com-putation of a fingerprint is achieved by first creating averagedframes from a large collection, and using these averagedframes for computing the fingerprint. For example, given avideo with m frames, g non-intersecting equal-sized subgroupsare formed each with d = mg frames. A Spatial DomainAveraged frame (SDA-frame) is created from each subgroup bygetting the mean of the d frames in the subgroup. Then, in thesecond step, each SDA-frame is denoised, and an averagingof the estimated PRNU noise patterns is done to arrive at thefinal camera fingerprint estimate. In this manner, the numberof frames that are denoised gets reduced by a factor of d . AnSDA-frame obtained from three different images is shown inFigure 2. (a) 1st (b) 2nd (c) 3rd (d) SDA-frame Fig. 2: SDA-frame is the average of st , nd , and rd frames.The proposed method is inspired by the fact that althoughthe denoising filter is designed to remove random noise froman image originating from the camera sensors (e.g., readoutnoise, shot noise, dark current noise etc.), as well as noisecaused by processing (e.g., quantization and compression), itis not able to do a perfect job. Therefore, some scene contentleaks into the extracted noise pattern. Averaging in the spatialdomain acts as a preliminary filter that smoothens the imageand potentially reduces the content noise that leaks into theextracted noise pattern. Of course, the effectiveness of theapproach then depends on the nature of the two noise signals.Below we analyzed this fact and characterized the relationshipbetween the noise signal arrived at by using the conventionalapproach and the SDA-approach.Further, when using the proposed approach, many questionsarise. First, does frame-averaging lead to a drop in the accuracyof the fingerprint computed as compared to the conventionalmethod, assuming the same number of images are used forboth? If so, what is the trade-off between the decrease incomputation and the loss in the accuracy? Can accuracy beincreased by utilizing more images in the SDA method? If so,what is the optimal combination of averaging and denoisingthat leads to the least computation while yielding the bestperformance? Then, we investigated these questions, both the-oretically and experimentally. We first provide a mathematical EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 4 analysis using a simple framework in the two subsectionsbelow. We then validate our study in the next section byproviding experimental results. The results show that spatialdomain averaging strategy can indeed result in significantsavings in computation while maintaining performance andin some cases, improving it.The rest of this section provides an analysis of spatialdomain averaging. To this end, we first provide an analysis ofthe conventional method and then analyze the SDA method.
A. Conventional method
As discussed in Section II, in the conventional method, thecamera fingerprint is estimated from n images from a knowncamera. Each image I can be modeled as I = I (0) + I (0) K + ψ ,where ψ is the random noise accumulated from a variety ofsources (as in (1)) and K is the PRNU noise.To estimate K , a denoising filter, F , such as [32], BM3D[33], is used to estimate the noise free signal I (0) . Using sucha filter, we denote the noise residual as W = I (0) K + ψ + ξ ,where ξ is the content noise. This noise is essentially dueto sub-optimal denoising filter that is unable to completelyeliminate the content from PRNU noise. Then, from the n known image, the camera fingerprint estimate, ˆ K , can beobtained using Maximum Likelihood Estimation ( MLE ) as ˆ K = (cid:80) ni =1 W i .I i (cid:80) ni =1 I i (2)where W i is noise pattern extracted from I i .Note that in the estimated camera fingerprint, ˆ K , ψ and ξ are the unwanted noise. The quality of ˆ K can be assessedfrom its variance V ar ( ˆ K ) [34]. The lower the variance is (i.e.,images with smooth content), the higher the quality becomes.Assuming that ψ and ξ are independent White Gaussian Noisewith variances σ and σ respectively, V ar ( ˆ K ) can be foundas (using Cramer-Rao Lower Bound as shown by Fridrich etal. [34]) V ar ( ˆ K ) ≥ σ + σ (cid:80) ni =1 I i . (3)Thus a better PRNU is obtained from lower σ and σ (i.e.,high luminance and and low textured image [34]). B. Proposed SDA method
In this subsection, we derive the variance of the estimatedcamera fingerprint obtained using frame averaging. We thencompare this variance with that obtained by the conventionalapproach (in (3)).Suppose I , I , . . . , I m are m images used to compute thecamera fingerprint using SDA method. With frame averaging,these m images are divided into g = md disjoint sets of equalsize with d pictures in each set. From each set, an SDA-frame is computed. Thereafter, the process is similar to theconventional approach. Each SDA-frame is denoised, and thecamera fingerprint is computed from g noise residuals usingMLE. Suppose, I SDAi is the SDA-frame obtained from the i th image set. Then I SDAi = (cid:80) idj =( i − d +1 I j d = (cid:80) idj =( i − d +1 ( I (0) j + I (0) j K + ψ j ) d We can write the above equation as I SDAi = I (0) ,SDAi + I (0) ,SDAi K + ψ SDAi , (4)where I (0) ,SDAi is the noise free image, and ψ SDAi is therandom noise (from pre-filtering sources) in the SDA-frame.This noise can be written as ψ SDAi = (cid:80) idj =( i − d +1 ψ j d . Suppose σ is the variance of ψ ’s (which is assumed to beWhite Gaussian Noise). Then, the variance of ψ SDAi turns outto be σ d .Suppose W SDA is the noise residual of each SDA-frame, I SDA . Then, W SDA = I SDA − F ( I SDA )= I (0) ,SDA K + ψ SDA + ξ (cid:48) , where F is the denoising filter, and ξ (cid:48) = I (0) ,SDA − F ( I SDA ) is the content noise due to the sub-optimal nature of thedenoising filter. Note that ξ (cid:48) is assumed to be independent ofPRNU signal I (0) ,SDA K (although ξ (cid:48) contains content layover I (0) ,SDA − F ( I SDA ) as ξ (cid:48) is negligible compared to I SDA K [34].We know that ξ (cid:48) is dependent on the smoothness of theSDA-frames. If the frames contain textured content, ξ (cid:48) is high.Assuming that SDA-frames have similar smoothness to theinput frames from which they are created, we consider that ξ (cid:48) and ξ have the same variance σ .Using MLE, the camera fingerprint can now be estimatedfrom g SDA-frames I SDA , I SDA , . . . , I SDAg as ˆ K SDA = (cid:80) gi =1 W SDAi .I SDAi (cid:80) gi =1 (cid:0) I SDAi (cid:1) . Using Cramer-Rao Lower Bound, the variance of the esti-mated fingerprint ˆ K SDA becomes
V ar ( ˆ K SDA ) ≥ σ d + σ (cid:80) gi =1 (cid:0) I SDAi (cid:1) . (5)In an ideal case, we want that the averaging operationdoes not degrade the quality of the estimated PRNU fromthe SDA-frames. In other words, we want that V ar ( ˆ K SDA ) is approximately equal to the variance from the conventionalmethod V ar ( ˆ K ) . That is, in other words, using the resultsfrom (3) and (5), it is desired that σ d + σ (cid:80) gi =1 (cid:0) I SDAi (cid:1) ≈ σ + σ (cid:80) ni =1 I i . EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 5
By simplifying the above equation, we get σ d + σ σ + σ ≈ (cid:80) gi =1 (cid:0) I SDAi (cid:1) (cid:80) ni =1 I i . Suppose (cid:80) gi =1 ( I SDAi ) (cid:80) ni =1 I i = gn × k where k = ( (cid:80) gi =1 ( I SDAi ) ) /g ( (cid:80) ni =1 I i ) /n . Note that the value of k is a temporary variable that is less thanor equal to as the numerator (cid:80) gi =1 ( I SDAi ) ) /g is less thanequal to the denominator (cid:80) ni =1 I i ) /n . Putting these values inthe above equation, we get gn × k ≈ σ d + σ σ + σ . Putting g = md in the above equation, we get m × kd × n ≈ σ + d × σ d × ( σ + σ ) . or, m ≈ nk × σ + d × σ σ + σ (6)We then discard the temporary variable, k , from the equa-tion. Since < k ≤ , the final equation becomes m ≤ n × (cid:16) σ + d × σ σ + σ (cid:17) (7)From (7), we can derive the following concluding remarks: • Since d ≥ , the right hand side of the equation isat least . Therefore, the number of images requiredin the proposed SDA method (i.e., m ) will be morethan or equal to the number of images required in theconventional method (i.e., n ). • For smooth images σ is close to zero. So, the impactof SDA-depth, d , will be negligible for such images.Therefore, SDA and conventional approaches will havesimilar performance. However the SDA technique will be d times faster in the best case. • For textured images, when the number of for both tech-niques is equal (i.e., m = n ), because σ is greater thanzero, conventional approach is expected to outperformSDA approach. • Since σ is greater than zero for textured images, the ratioof images for SDA- divided by conventional approach, mn , will increase as the SDA-depth, d , increases. There-fore, SDA approach will require more images to achievesame performance for textured images.Notice that it is hard to characterize the relationship of σ and σ , also σ depends on various factors such as shot noise,exposure time, temperature, illumination, image content andso on. Therefore, we are not focusing on their relationshipin this research. In the following section, we experimentallyvalidate the observations listed above. IV. V ALIDATION OF ANALYSIS
In this section, we experimentally verify the main conclu-sions arrived at by the analysis performed in the previoussection. In our experiments we use both flatfield and texturedimages from the VISION dataset [13]. The implementationswere done using Matlab 2016a on Windows 7 PC with 32GB memory and Intel Xeon(R) E5-2687W v2 @3.40GHzCPU. The wavelet denoising algorithm [32] was used to obtainfingerprint and PRNU noise. PCE and NCC methods wereused for comparison. A preset threshold of 60 [35] was usedfor PCE values. Values higher than this threshold were takento conclude that the two media objects originated from thesame camera.
A. Studying the effect of smoothness
To verify the observations of the analysis related to smooth-ness of the images used to compute a camera fingerprint, werandomly selected 50 flatfield images and textured imagesfrom each camera in the dataset. For each of these types, fiveexperiments were conducted by using a random set of , , , , and images for computing the fingerprint. So forexample, when we chose flatfield images, we created onefingerprint using the conventional approach by denoising eachof the 30 images and then averaging the PRNU noise patternsto arrive at the fingerprint estimate. Then a fingerprint estimateusing the SDA approach was computed by averaging the same images in the spatial domain first and then denoising thisSDA-frame of depth to directly arrive at another fingerprintestimate. Therefore, a total of fingerprints were obtainedfor each camera ( types of images; fingerprint extractiontechniques; different cardinalities of image sets used forfingerprint computation).Each of these two fingerprints was correlated with thePRNU noise obtained from the rest of the images in the datasettaken with the same camera. This set consisted of both texturedand flat-field images. To create an abundance of test cases, wedivided each full resolution fingerprint into × disjointblocks and correlated them with the corresponding blocks inthe test images to match the PRNU noise. As a result, a totalof , comparisons were made. Fig. 3: The effect of texture in terms of PCEFig. 3 shows how image content affects the PCE for finger-prints obtained from , , , or flatfield and texturedimages. The figure shows that with flatfield images, despite thesignificantly lower number of denoising operations performed EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 6 by the SDA approach, the results obtained are similar to theconventional approach. This observation holds regardless ofthe number of images averaged for fingerprint extraction. Theperformance of the SDA approach drops for textured images.However, this difference can be overcome by increasing thenumber of images used for SDA technique but still keeping thenumber of denoising operations lower than the conventionalapproach. We investigate this issue in the next subsection.If we consider the above results in terms of TPR, the SDAapproach starts doing better as the PCE is thresholded arounda set value (60 in our case) to arrive at the attribution result. Soa drop in PCE does not necessarily result in a wrong decision.This improvement can be observed in Fig. 4 which shows TPRfor the same experiments when the threshold is set to asproposed in [35]. The other implications of these figures arealready well-known in the field (i.e., flatfield images are betterthan textured and as the number of images increase quality offingerprint also increases which results in a higher PCE andTPR.) Fig. 4: The effect of texture in terms of TPRTable I shows the average time it takes to extract afingerprint estimate by the two methods in the above experi-ment. Notice that in both cases the same number of images, m , are read from the disk but for the SDA technique onlyone denoising operation is needed whereas for conventionalway, m denoising operations are done. This implies that asthe training images increase, the speedup also increases. Aspeedup of . times can be achieved by averaging imagesbefore denoising.TABLE I: Average time to extract fingerprints with proposedand conventional methods (in sec) SDA 4.97 5.99 8.22 10.35 14.49Conventional 21.57 40.81 79.96 118.79 196.59Speedup 4.34 6.81 9.73 11.48 13.57
B. Fingerprint equivalence for textured images
For textured images, our analysis indicated that more imagesare needed by the SDA method and hence a correspondingreduction in the speedup obtained would occur. In this ex-periment, our goal is to investigate the relationship betweenthe number of images required by SDA compared to thenumber needed by the conventional approach to yield similar performance for textured images while still retaining a speed-up in fingerprint computation. This experiment was againperformed using images from the VISION dataset [13].We created a training set from textured images for eachcamera in the VISION dataset. fingerprints were createdusing , , . . . images using the conventional approach.We also created fingerprints using SDA method using , , . . . images. As done in the previous experiment, eachfingerprint was partitioned into disjoint × blocks andcorrelations were computed with the corresponding blocks ofthe test PRNU noise pattern. Fig. 5: Fingerprint equivalence for SDA and conventional ap-proaches. x-axis indicates number of images for conventional.The left of y-axis (red) is the number of images required forSDA and the right one (blue) is the speedup gained in thiscase.Figure 5 shows the number of images required by the SDAapproach to achieve at least the same TPR as the conventionalapproach. Moreover, it shows the speedup gained in thesecases. For example, when fingerprint is created from textured images using conventional way, the same TPR canbe achieved using images in SDA approach. In this way,the fingerprint extraction is approx. . times faster for SDAapproach. The figure shows that using − times more imagesfor SDA method, up to times speedup can be achieved withno loss in TPR when the images are textured. C. Effect of SDA-depth on image fingerprint
In Section III, we have shown that as the SDA-depth in-creases, when the number of images for fingerprint extractionis constant, the TPR is expected to drop. To verify this remark,we used textured images for fingerprint extraction. Wedidn’t include any flatfield image in this set as flatfield imagesresults in a negligible difference in performance between SDAand conventional fingerprints.We then created fingerprints using textured images fromeach camera in the VISION dataset. We set SDA-depth to , , , , and . Therefore, we created , , , , , and SDA-frames, respectively. The SDA-frames were de-noised and then averaged to arrive at the final fingerprintestimate. For each fingerprint estimate computed, the rest ofthe images were used as test images. We correlated each
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 7 fingerprint with the PRNU noise extracted from the test imagesin a block-wise manner as done in previous experiments.Notice that
SDA − is the same as conventional approach. SDA- SDA- SDA- SDA- SDA- SDA- PCE 652.8 514.6 390.0 332.2 285.0 252.4TPR 0.80 0.78 0.75 0.72 0.69 0.67
TABLE II: SDA-depth vs TPR and PCE, change with figureTable II shows that as the SDA-depth increases, the averagePCE decreases. For textured images, the more images wecombine to create an SDA-frame, the lower the PCE and TPRvalues that will result. This supports the third observation ofthe analysis in Section 3.This section has provided a validation of Section III byexperimentally supporting all three observations derived fromthe analysis. Namely, when images are not textured, henceresulting in low post-filtering noise, both the SDA and con-ventional fingerprints from the same images perform similarlywhich can lead to . times speedup. On the other hand,textures images and larger SDA-depth result in requiringhigher number of images to achieve the same performanceas conventional approach. Yet, a speedup by a factor of canstill be achieved in most cases.In the next section, we apply the proposed approach to prac-tical problems, and show that SDA fingerprints can performwith a significantly higher accuracy or result in significantspeedup compared to state-of-the-art fingerprint extractiontechniques.V. A PPLICATION TO COMPUTING VIDEO FINGERPRINTS
In this section, we investigate a more practical use case ofthe proposed SDA technique which is its usage for extractingFE from videos. As Section II explains, two of the mostcommon ways to extract a fingerprint from a video are usingonly I-frames or using all frames (or the first n frames).While the former results in low performance, the latter canbe impractical in many real life applications due to veryhigh computational needs. For example, fingerprints from − minute videos (i.e., approximately frame per video)using a single-thread may take up to a day to compute. In thissection, we provide experimental results that demonstrate howusing the SDA approach can provide significant improvementsin the time needed for computing fingerprint estimates fromvideo, while retaining the same performance obtained usinga significantly larger number of denoising operations usingconventional approaches.In each experiment below, three different types of finger-prints (i.e., I-frames only, SDA-frames and ALL-frames) wereobtained from each video. For the sake of simplicity, we referto them as I-FE (i.e.,
Fingerprint Estimate ), SDA-FE , and
ALL-FE , respectively. Moreover, in some cases, we add anindication of the SDA-depth when we need to highlight it.For example, SDA-50-FE indicates that the video frames weredivided into groups of and each group averaged to createan SDA-frame.In the first experiment, we examine source matching forvideos. That is given two videos, can we determine if they are from the same camera. Next we investigate a more difficultcase that involves mixed media. In this subsection, we alsoanalyze an important question related to mixed media: “Whatis a good balance of SDA-depth which optimizes speed andperformance?”. In the next two subsections, we examine theperformance achieved with video and images obtained fromsocial media such as Facebook and YouTube. Finally, we showhow the proposed technique can be used for source attributionwith moderate length stabilized videos (i.e., up to minutes)from which obtaining a “reliable” FE might take couple ofhours each using all frames.Two datasets were used in all the experiments, the NYUAD-MMD, and VISION datasets. The NYUAD-MMD datasetcontains images and videos of different resolutions and aspectratios from cameras from different models and brands. Thismakes it a challenging dataset for mixed media attribution.Moreover, it contains stabilized videos longer than minutesfrom cameras. Hence, we used this dataset for experimentsusing mixed media and stabilized video. The videos in thedataset are typically around seconds( i.e., each video isapproximately frames) and images are pristine (i.e., noout-camera operations). The VISION dataset contains differ-ent high quality videos and images from social media suchas Facebook and YouTube. Hence, we used this dataset inexperiments involving social media. A. Matching Two Non-Stabilized Videos
In the first experiment, we examine source matching forvideos using FE computed from the three different approachesthat have been presented. Our goal was to estimate the lengthof videos and the resulting computation time needed to achievegreater than
TPR for I-FEs, SDA-FEs and ALL-FEs. Thisway, a clear comparison of the the three approaches could bemade.FE from the non-stabilized videos of the same resolutionfrom the VISION dataset were first created. FE were extractedfrom the first , , . . . seconds of each video using the twotechniques mentioned in Section II and the proposed method.On average, each video had approximately one I-frame persecond. We selected an SDA-depth of resulting in an SDA-frame from each second of video.
5 10 15 20 25 30 35 400.40.50.60.70.80.91
Fig. 6: TPR for different lengths of video using I-FEs, SDA-FEs, and ALL-FEs
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 8
Figure 6 shows TPR using I-FE, SDA-FE, and ALL-FEas the length of the videos increases. As seen, SDA-FEsoutperforms ALL-FEs in this setting for all video lengths. Thedifference varies between . (for sec videos) and . (for sec videos). Both FE achieve significantly higher TPR thanI-FEs. For example, for seconds video, SDA-FEs and ALL-FEs result in . and . TPR, respectively, whereasI-FEs can only reach . TPR.The highest TPR achieved using I-FEs was . (i.e., for second videos) which is still lower than the TPR of SDA-FEs and ALL-FEs when they were computed from only -second videos (i.e., more than ). This is because SDA-FEand ALL-FEs use all the frames in a 5 second video (i.e.,I-, B- or P-frames) whereas the I-FEs use only I-frameson average and “waste” the rest of the frames. Hence, for thissetting, I-FEs fail to reach to a comparable accuracy as theother two methods.TABLE III: Time for video fingerprint extraction in second type averaging I/O + denoising totalI-FE 0 50 50SDA-FE 12 50 62ALL-FE 0 1407 1407
We then estimated the time required for extraction of eachFE from a second Full HD video captured @30 FPS.Table III compares the average times for them. It takes , , and seconds for an I-FE, SDA-FE and ALL-FE, re-spectively. However, these times are when each one is obtainedfrom second videos. When we evaluate the required timeto achieve TPR, we need less than seconds of videofor SDA-FEs and ALL-FEs whereas I-FEs require secondsof video. This suggest that the required time for SDA-FEsand ALL-FEs are less than and seconds, respectively.Hence, SDA technique is at least times faster than I-FEsand requires times shorter videos, yet still achieves a higherTPR. Moreover, it performs up to . higher than ALL-FEsin terms of TPR and speeds up approximately . times inthis setting. Moreover, while SDA-FEs can achieve TPRwith seconds videos, the same can be achieved with seconds for ALL-FEs. Therefore, close to times speedupcan be achieve in this case when SDA-depth is set to .Notice that these results involve videos that did not undergoany processing such as scaling, compression in social mediaand so on. Also, all videos were taken with high luminancein the VISION dataset. Therefore, it is possible to have lowerperformance with more difficult datasets such as when videosare dark or processed. However, our intention here was todemonstrate the effectiveness of SDA approach first for thesimplest of cases. We examine more challenging situations infurther experiments below. B. Mixed Media Attribution
As we have seen in the previous subsection, using I-FEscauses a significant drop in TPR whereas − secondsof video is enough to achieve more than TPR for bothSDA-FEs or ALL-FEs. In this subsection, we investigate amore challenging scenario where a video FE needs to bematched with a single query image. In [14], source attribution with mixed-media was investigated using the NYUAD-MMDdataset which is a very challenging dataset containing imagesand videos of various resolutions from of cameras. Here, weperformed “Train on videos and test on images” experimentfor I-FEs, SDA-FEs, and ALL-FEs. That is a camera FEwas computed from the video and the query image wascropped and resized and its PRNU matched with the FE. Theresizing and cropping parameters to perform the matchingwere obtained from the “Train on images, test on videos”experiment done in [14]).The videos in this dataset were typically around secondslong; each having approximately frames. The datasetcontains a total of non-stabilized videos and imagesfrom those cameras. Each video FE was correlated with thePRNU noise of all the test images from the same camera toestimate “true cases” which ended up with correlations.Then, each video FE from i th camera was compared with thePRNU noise of images from ( i + 1) th camera for resizing andcropping parameters that maximizes the PCE for the imageFE (i.e., the FE obtained from all images of the camera usingconventional approach). This way, we estimated the “falsecases” resulted in correlations.In the previous experiment we had used a fixed SDA-depth, d , of . In this experiment we used different SDA-depths to investigate its impact on performance and speed.Given a video of m frames (in our case approximately frames), we divided the frames into groups of d =1 , , , , , , . Therefore, the number of SDA-frames, g , became , , , , , , respectively.When g = 1 , the technique becomes the same as using allframes whereas when p = 1200 , only a single SDA-frameis created by averaging all frames. After obtaining thePCE of the “true” and “false” cases, we created an ROCcurve for each video FE type/depth. Figure 7 shows theROC curves for each of the SDA-FEs of different depths,as well as I-FE and ALL-FE. The results show that ALL-FE results in the highest performance, whereas I-FE performsignificantly poorer compared to others. The proposed SDAmethod performs close to ALL-FE method for all depths.Fig. 7: The ROC curves for varying SDA-depthsTable IV shows more detailed results. | P CE | stands for theaverage of the PCE ratios with respect to I-FEs. For example,when an ALL-FE from i th video is correlated with the noise of j th image, its PCE is on average . times higher comparedto the I − F E obtained from the same video. The reason
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9 we used such a normalization instead of average PCE is thatoutliers have a big impact on average PCE. Moreover, thetable shows the TPR for the PCE threshold of , averagetime to extract a FE, and the speedup compared to ALL-FEs.As seen, the results indicate that the TPR of SDA method arevery close to ALL-FE. However, a speedup of up to timescan be achieved using the SDA method.TABLE IV: Detailed information for mixed media attribution I- ALL- 5 10 30 50 200 1200 | P CE | % ) 64.0 speedup 28.1 1.0 5.1 9.9 22.7 29.3 44.0 Similar to the previous experiment using I-FEs have signif-icantly lower accuracy (at least lower TPR). Moreover,when SDA-depth ≥ , SDA-FEs are faster to extract as com-pared to I-FEs. Notice that when ALL-FEs are used, it takesapproximately five days to extract all the FEs from the videos in the NYUAD-MMD dataset using a single-threadedimplementation. This type of performance will clearly imprac-tical for many applications. C. Train and test on YouTube videos
This experiment explores the performance achieved whentwo video FEs from YouTube are correlated. Although thisexperiment is essentially the same as the Section V-A, it isrelevant in practice as high compression is involved. Notethat a key motivation of the SDA approach is that when highcompression is used, a large number of frames are needed forcomputing a reliable FE. We created FE from all non-stabilizedYouTube videos in VISION dataset (i.e., the ones labeledflatYT, indoorYT, and outdoorYT) using only I-frames, SDA- , SDA- , SDA- , and ALL-frames. Here, we used thefirst , , . . . seconds of the YouTube videos to extractFEs. Each second video had approximately framesthat were used for SDA- or ALL-FEs, whereas they contained . I-frames on average. After fingerprint extraction, wecorrelated each video FE with others of the same type andsame length taken by the same camera. For example, an I-FEfrom seconds of video is correlated with all I-FEs obtainedfrom the rest of the seconds videos from the same camera.The same was done for SDA- and ALL-FEs. This way, a totalof correlations were done for each type.Figure 8 shows the TPR for varying lengths of video foreach FE type. The figure shows that I-FEs perform verypoorly for all cases and any FE type created from video ofmore than seconds outperforms I-FEs. While ALL-FEsperform better than SDA-FEs for the same-length videos, thisdifference can be overcome by increasing the video lengthbut still using much fewer denoising operations. For example,SDA − obtained from second videos or SDA − from seconds videos, perform approximately the same as ALL-FEs obtained from seconds (within + − TPR range).Hence, instead of using frames for ALL-FEs, using frames for SDA − can result in significant speedup with noloss in TPR. While an ALL-FE from frame of a Full HD
10 20 30 40 50 6000.10.20.30.40.50.60.7
Fig. 8: The effect of FE type and video length on TPR forYouTube videosvideo takes seconds to compute, and SDA − FE from frames, which only does denoising instead of ,takes seconds to compute. Therefore, a speedup of close to times can be achieved with SDA − with increase inTPR. Notice that, because most videos are around secondsin the VISION dataset, it limits the maximum length we coulduse in our experiments. D. Train on Facebook images, test on YouTube videos
From the previous experiments, we know that the SDAmethod can help achieve a significant speedup for both videosand images with a small loss in performance which can beovercome by increasing the number of still images used forfingerprint extraction if available. In this experiment, our goalwas to show that the proposed method can be successfullyapplied to other social media. Specifically, in this subsection,we extract FEs from Facebook images and match them withthe FE of YouTube videos. We call this the “Train on Facebookimages, test on YouTube videos” experiment. The importanceof this experiment is both media sharing services containbillions of visual media and computing ALL-FEs from thesecollections can have very high time complexity. Therefore,faster fingerprint extraction methods (along with search tech-niques) that speeds up attribution are badly neededlIn this experiment, for the cameras in the VISION datasetthat had non-stabilized videos, we created a FE from
Facebook images (i.e., the ones labeled FBH) using con-ventional fingerprint computation method. We then used theFEs from non-stabilized YouTube videos (those created in theprevious experiment). We again used I-frames, SDA- , SDA- , SDA- , and ALL-frames that were computed from thefirst seconds of YouTube videos. We then correlated theimage FE of a camera with the FE of each video of each typeusing the efficient search proposed in [14] and a total of pairs were compared for each FE type. Table V shows theTPR of these correlations. Similar to “Train on videos, test onimages” experiment, these results show that for FEs obtainedfrom Facebook images matches with . TPR with theYouTube videos for SDA- which is higher than both ALL-FEs and I-FEs. On the other hand, FEs from I-frames yieldapproximately lower TPR. These results show that SDAapproach is a good replacement over using I-FEs or ALL-FEs EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 10 for this scenario.TABLE V: TPR of different FE types when a FE from Face-book images and another from YouTube videos are correlated
I-FE SDA- SDA-
SDA-
ALL-FETPR 51.60 81.4 79.88 78.13 79.59
E. Matching two stabilized videos
A recent work [12] has shown that a FE obtained froma long stabilized video can successfully be matched withother videos from the same camera. However, thousands offrames must be denoised. This may not be practical in manycircumstances. A potential alternative for this problem is theuse of SDA method which may lead to a significant speedup. To evaluate this, we captured stabilized videos from cameras. A total of videos were captured which added upto minutes.We extracted FEs from the frames of , , . . . secondvideo lengths using conventional (I-frame and ALL-Frame)method as well as SDA method for SDA-depths of , , and . These depths were deemed to be reasonable choicesfrom previous experiments. As shown in [8], [10], [11],the first frame of the videos are typically not geometricallytransformed. Since we divide video into pieces, some videopieces do not have an untransformed frame. So, we discardedthe first frame of each video to avoid inconsistencies. Wecorrelated each FE with the other FEs of different videos fromthe same camera that are created using the same number offrames. For example, SDA − − FEs of second videosare correlated with the same type FEs from the same camera.Figure 9 shows the TPR for three cameras (i.e., HuaweiHonor, Samsung S8, and iPhone 6plus) and the total averageof all the five cameras. Fig. 9: TPR for stabilized videos for varying SDA-depthsThe results show that as videos get longer, ALL-FEsand SDA-FEs achieve higher TPR. Moreover, the effect ofincreased SDA-depth is more significant for this case incomparison to non-stabilized videos. While for some camerasALL-FEs and SDA-FEs perform similarly (e.g., Huawei andSamsung cameras), for others (e.g., iPhone cameras) thereis a significant difference between the two. For example,for Samsung S8
SDA − -FE from seconds video, perform similarly as seconds ALL-FE. Therefore, forthis particular case, SDA − can speedup times (cid:0) i.e. × (cid:1) (see Table IV for times). On the other hand foriPhone 6 plus, ALL-FEs from seconds video and seconds SDA − have similar TPR. Therefore, times (cid:0) i.e., × (cid:1) speedup can be achieved in this case. Hence,a speedup between these numbers (i.e. and ) can beachieved without any loss in TPR if a long video is available.Overall, this section shows that the proposed SDA-FEsoutperforms the commonly used I-frame-only technique in allthe cases for videos. These include mixed media, stabilizedvideos, and social media. On the other hand, the SDA-FEsachieves comparable results as ALL-FEs with up to timesspeedup in these experiments. We also show the impact ofSDA-depth on the performance that can be achieved in variouscases. VI. C ONCLUSION AND F UTURE W ORK
This paper has investigated camera fingerprint extractionusing Spatial Domain Averaged frames, which are the arith-metic mean of multiple still images. By adding one extrastep of averaging before denoising, a significant speedup canbe achieved for fingerprint extraction. We show that thistechnique can successfully be used for images, non-stabilizedvideos as well as stabilized video to speedup fingerprintextraction process. The proposed method is especially usefulwhen the number of denoising operations needed can be veryhigh. For example, when dealing with non-stabilized or highlycompressed stabilized videos or images from social media.It is often considered that for video source attribution, usingonly I-frames for fingerprint extraction (I-FEs) is “enough” toachieve high performance. However, in this research, we haveshown that I-FEs performs poorly compared to ALL-FEs in allcases. On the other hand, using ALL-FEs is impractical dueto the large computation time needed for practical scenarioswhere thousands of videos can be available. The proposedSDA approach comes into play here to resolve the problem ofI-FEs (i.e., accuracy) and ALL-FEs (i.e., speed). Both SDA-and ALL-FEs perform similarly in most cases. When the SDAmethod performs worse, this can be overcome by using moreof the available frames if any.The proposed technique can be used for other sourceattribution related problems where many denoising operationsare needed. For instance, this method can be applied whenmany “partially misaligned” still images and a suspect cameraare available. For example, a seam carved video contains manypartially misaligned frames with its source camera. In sucha scenario, instead of denoising all frames of the video, theSDA technique can be used as a way to speed up this process.Moreover, determining whether a video is stabilized or not isanother issue which requires a number of denoising operations.As an alternative to using only I-frames, the proposed SDAtechnique could successfully work with only denoisingoperations.Another avenue for future research is to create an SDA-FE in a weighted manner such that performance achieve withSDA method can be increased. Two of the potential ways to EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 11 achieve this are weighting I-, P- and B- frames differently,and weighting the frames in a block-by-block manner. Forexample, it has been shown that flatfield images performbetter with SDA method compared to textured ones. Usingthis idea, one may weight textured regions differently fromsmooth regions. R
EFERENCES[1] J. Lukas, J. Fridrich, and M. Goljan, “Digital camera identification fromsensor pattern noise,”
IEEE Transactions on Information Forensics andSecurity , vol. 1, no. 2, pp. 205–214, 2006.[2] M. Goljan, J. Fridrich, and T. Filler, “Managing a large database ofcamera fingerprints,” in
Media Forensics and Security II , vol. 7541.International Society for Optics and Photonics, 2010, p. 754108.[3] S. Bayram, H. T. Sencar, and N. Memon, “Efficient sensor fingerprintmatching through fingerprint binarization,”
IEEE Transactions on Infor-mation Forensics and Security , vol. 7, no. 4, pp. 1404–1413, 2012.[4] D. Valsesia, G. Coluccia, T. Bianchi, and E. Magli, “Compressedfingerprint matching and camera identification via random projections,”
IEEE Transactions of Information Forensics and Security , vol. 10, no. 7,pp. 1472–1485, July 2015.[5] S. Bayram, H. T. Sencar, and N. Memon, “Sensor fingerprint identifica-tion through composite fingerprints and group testing,”
IEEE Transac-tions of Information Forensics and Security , vol. 10, no. 3, pp. 597–612,March 2015.[6] S. Taspinar, H. T. Sencar, S. Bayram, and N. Memon, “Fast camerafingerprint matching in very large databases,” in
Image Processing(ICIP), 2017 IEEE International Conference on . IEEE, 2017, pp. 4088–4092.[7] W.-H. Chuang, H. Su, and M. Wu, “Exploring compression effects forimproved source camera identification using strongly compressed video,”in
Image Processing (ICIP), 2011 18th IEEE International Conferenceon . IEEE, 2011, pp. 1953–1956.[8] S. Taspinar, M. Mohanty, and N. Memon, “Source camera attributionusing stabilized video,” in
Information Forensics and Security (WIFS),2016 IEEE International Workshop on . IEEE, 2016, pp. 1–6.[9] S. Chen, A. Pande, K. Zeng, and P. Mohapatra, “Video source iden-tification in lossy wireless networks,” in
IEEE INFOCOM , 2013, pp.215–219.[10] M. Iuliani, M. Fontani, D. Shullani, and A. Piva, “A hybrid approach tovideo source identification,” arXiv preprint arXiv:1705.01854 , 2017.[11] S. Mandelli, P. Bestagini, L. Verdoliva, and S. Tubaro, “Facing deviceattribution problem for stabilized video sequences,”
IEEE Transactionson Information Forensics and Security , 2019.[12] J. Lubin, M. Isnardi, C. Spence, I. Sur, and A. Chaudhry, “Jointsensor fingerprinting and processing history recovery for visual mediaforensics,” Private conversation, 2018.[13] D. Shullani, M. Fontani, M. Iuliani, O. Al Shaya, and A. Piva, “Vision:a video and image dataset for source identification,”
EURASIP Journalon Information Security , vol. 2017, no. 1, p. 15, 2017.[14] S. Taspinar, M. Mohanty, and N. Memon, “Source camera attribution ofmulti-format devices.”[15] J. Luk´aˇs, J. Fridrich, and M. Goljan, “Digital camera identificationfrom sensor pattern noise,”
IEEE Transactions Information Forensicsand Security , vol. 1, no. 2, pp. 205–214, 2006.[16] Y. Sutcu, S. Bayram, H. T. Sencar, and N. Memon, “Improvements onsensor noise based source camera identification,” in
IEEE InternationalConference on Multimedia and Expo , 2007, pp. 24–27.[17] C. T. Li and Y. Li, “Color-decoupled photo response non-uniformity fordigital image forensics,”
IEEE Transactions on Circuits and Systems forVideo Technology , vol. 22, no. 2, pp. 260–271, 2012.[18] G. Chierchia, S. Parrilli, G. Poggi, C. Sansone, and L. Verdoliva, “Onthe influence of denoising in PRNU based forgery detection,” in
ACMMultimedia in Forensics, Security and Intelligence , 2010, pp. 117–122.[19] C. T. Li, “Source camera identification using enhanced sensor patternnoise,”
IEEE Transactions on Information Forensics and Security , vol. 5,no. 2, pp. 280–287, 2010.[20] W. Yaqub, M. Mohanty, and N. Memon, “Towards camera identificationfrom cropped query images,” in . IEEE, 2018, pp. 3798–3802.[21] S. Bayram, H. T. Sencar, and N. Memon, “Seam-carving basedanonymization against image & video source attribution,” in
IEEEWorkshop on Multimedia Signal Processing , 2013, pp. 272–277. [22] S. Taspinar, M. Mohanty, and N. Memon, “Prnu based source attributionwith a collection of seam-carved images,” in
Image Processing (ICIP),2016 IEEE International Conference on . IEEE, 2016, pp. 156–160.[23] M. Goljan and J. Fridrich, “Camera identification from scaled andcropped images,”
Proc. SPIE, Electronic Imaging, Forensics, Security,Steganography, and Watermarking of Multimedia Contents X , vol. 6819,pp. 68 190E–68 190E–13, 2008.[24] E. J. Alles, Z. J. Geradts, and C. J. Veenman, “Source camera identifica-tion for low resolution heavily compressed images,” in
ComputationalSciences and Its Applications, 2008. ICCSA’08. International Confer-ence on . IEEE, 2008, pp. 557–567.[25] K. Rosenfeld and H. T. Sencar, “A study of the robustness of prnu-based camera identification,” in
Media Forensics and Security , ser. SPIEProceedings, E. J. Delp, J. Dittmann, N. D. Memon, and P. W. Wong,Eds., vol. 7254. SPIE, 2009, p. 72540.[26] M. Goljan, J. Fridrich, and J. Luk´aˇs, “Camera identification from printedimages,”
Proceedings of SPIE
Signal Processing Systems , vol. 1, pp. 1–18, June 2012.[28] M. Chen, J. Fridrich, M. Goljan, and J. Lukas, “Source digital camcorderidentification using sensor photo response non-uniformity,” in
SPIEElectronic Imaging , 2007, pp. 1G–1H.[29] S. McCloskey, “Confidence weighting for sensor fingerprinting,” in
IEEECVPR Workshops , 2008, pp. 1–6.[30] N. Ejaz, W. Kim, S. I. Kwon, and S. W. Baik, “Video stabilization bydetecting intentional and unintentional camera motions,” in
IEEE Inter-national Conference on Intelligent Systems, Modelling and Simulation ,2012, pp. 312–316.[31] Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H.-Y. Shum, “Full-frame video stabilization with motion inpainting,”
IEEE Transactionson Pattern Analysis and Machine Intelligence , vol. 28, pp. 1150–1163,July 2006.[32] M. K. Mihcak, I. Kozintsev, K. Ramchandran, and P. Moulin, “Low-complexity image denoising based on statistical modeling of waveletcoefficients,”
IEEE Signal Processing Letters , vol. 6, no. 12, pp. 300–303, 1999.[33] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Bm3d im-age denoising with shape-adaptive principal component analysis,” in
SPARS’09-Signal Processing with Adaptive Sparse Structured Repre-sentations , 2009.[34] J. Fridrich, “Sensor defects in digital image forensic,”
Digital ImageForensics , pp. 1–43, 2013.[35] M. Goljan, J. Fridrich, and T. Filler, “Large scale test of sensorfingerprint camera identification,” in