[PDF] Camera Fingerprint Extraction via Spatial Domain Averaged Frames

Abstract

Photo Response Non-Uniformity (PRNU) based camera attribution is an effective method to determine the source camera of visual media (an image or a video). To apply this method, images or videos need to be obtained from a camera to create a "camera fingerprint" which then can be compared against the PRNU of the query media whose origin is under question. The fingerprint extraction process can be time-consuming when a large number of video frames or images have to be denoised. This may need to be done when the individual images have been subjected to high compression or other geometric processing such as video stabilization. This paper investigates a simple, yet effective and efficient technique to create a camera fingerprint when so many still images need to be denoised. The technique utilizes Spatial Domain Averaged (SDA) frames. An SDA-frame is the arithmetic mean of multiple still images. When it is used for fingerprint extraction, the number of denoising operations can be significantly decreased with little or no performance loss. Experimental results show that the proposed method can work more than 50 times faster than conventional methods while providing similar matching results.

Full PDF

IIEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 1

Camera Fingerprint Extraction via Spatial DomainAveraged Frames

Samet Taspinar, Manoranjan Mohanty, and Nasir Memon

Abstract —Photo Response Non-Uniformity (PRNU) basedcamera attribution is an effective method to determine thesource camera of visual media (an image or a video). To applythis method, images or videos need to be obtained from acamera to create a “camera ﬁngerprint” which then can becompared against the PRNU of the query media whose origin isunder question. The ﬁngerprint extraction process can be timeconsuming when a large number of video frames or imageshave to be denoised. This may need to be done when theindividual images have been subjected to high compression orother geometric processing such as video stabilization. This paperinvestigates a simple, yet effective and efﬁcient technique to createa camera ﬁngerprint when so many still images need to bedenoised. The technique utilizes Spatial Domain Averaged (SDA)frames. An SDA-frame is the arithmetic mean of multiple stillimages. When it is used for ﬁngerprint extraction, the number ofdenoising operations can be signiﬁcantly decreased with little orno performance loss. Experimental results show that the proposedmethod can work more than times faster than conventionalmethods while providing similar matching results. Index Terms —PRNU, video forensics, camera ﬁngerprintextraction, image forensics.

I. I

NTRODUCTION

PRNU-based source camera attribution is a well-studiedand successful method in media forensics for ﬁnding thesource camera of an anonymous image or video [1]. Themethod is based on the unique Photo Response Non Uni-formity (PRNU) noise of a camera sensor array stemmingfrom manufacturing imperfections. This PRNU noise can actas a camera ﬁngerprint. The PRNU approach is often used intwo scenarios: camera veriﬁcation and camera identiﬁcation.Camera veriﬁcation aims to establish if a given query imageor a video is taken by a suspect camera. This is done bycorrelating the noise estimated from the query image or videowith the ﬁngerprint of the camera usually is computed bytaking pictures from the camera under controlled conditions. Incamera identiﬁcation, the potential source camera of the queryimage or video is determined from a large database of cameraﬁngerprints. One can view camera identiﬁcation as essentiallythe same as performing n camera veriﬁcation tasks where n isthe number of camera ﬁngerprints in the database. However,when performing identiﬁcation, it is assumed that the cameraﬁngerprints are pre-computed.In both veriﬁcation and identiﬁcation, it is often the case thatthere is no camera available to create ﬁngerprints under con- Samet Taspinar (email: [email protected]) and Manoranjan Mohanty email:[email protected]) are with Center for Cyber Security, New YorkUniversity Abu Dhabi, UAE. Nasir Memon (email: [email protected]) is withDepartment of Computer Science and Engineering, New York University, NewYork, USA. trolled conditions. Rather, camera ﬁngerprints are estimatedfrom a set of publicly available media assumed to be from thesame camera. Such media can have a very diverse range ofquality and content and often lacks metadata.For efﬁcient ﬁngerprint matching in large databases, variousapproaches have been proposed. Fridrich et al. [2] proposedthe use of ﬁngerprint digests in which a subset of ﬁngerprintelements having the highest sensitivity are used instead of theentire ﬁngerprint. Bayram et al. [3] introduced binarizationwhere each ﬁngerprint element is represented by a singlebit. Valsesia et al. [4] proposed the idea of applying randomprojections to reduce ﬁngerprint dimension. Bayram et. al. [5]introduced group testing via composite ﬁngerprint that focuseson decreasing the number of correlations rather than decreas-ing the size (storage) of a ﬁngerprint. Recently, Taspinar et al.[6] proposed a hybrid approach that utilizes both decreasingthe size of a ﬁngerprint and the number of correlations. Allthese methods were designed and tested for images, howeverthey can also be used for videos.Although the image-centric PRNU-based method can beextended to video [7]–[9], source camera attribution with videopresents a number of new challenges. First, a video frame ismuch more compressed than a typical image. Therefore, thePRNU signal extracted from a video frame is of signiﬁcantlylower quality than one obtained from an image. As a result,a larger number of video frames are required to compute theﬁngerprint. In fact, Chuang et. al. [7] found that it is best touse all the frames instead of using only the I- or P-framesto compute a ﬁngerprint. Using a large number of framescan introduce signiﬁcant computation overhead. For example,computing a ﬁngerprint from I-frames of a one-minute HDvideo requires one to two minutes, whereas to minutesis required if all frames are used.In the case of camera identiﬁcation, the amount of compu-tation can be prohibitive in practical scenarios. For example,for computing ﬁngerprints from a thousand one-minute FullHD videos (using all frames) using a PC may takemore than days. Clearly, with billions of media objectsuploaded every day on the Internet, large scale camera sourceidentiﬁcation becomes quickly infeasible. Although cameraﬁngerprints stored in a database may have to be computedjust once by a system, computing a ﬁngerprint estimate atrun-time from a query video can be prohibitive when facedwith a reasonable number of query videos presented to thecamera identiﬁcation system in a day.Besides source camera identiﬁcation, digital stabilizationoperations performed within modern cameras also present asigniﬁcant challenge for PRNU-based source camera veriﬁ- a r X i v : . [ c s . MM ] S e p EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2 cation for video [8], [10], [11]. Video stabilization results insensor-pixel mis-alignments between individual frames of thevideo as the geometric transformations performed to compen-sate for camera motion and spatially align each frame aredifferent. An accurate camera ﬁngerprint cannot be obtainedusing mis-aligned frames as is done with non-stabilized videoeven if video quality is very high. Although there are somepreliminary methods that address source camera veriﬁcationfor stabilized video, [8], [10], these methods are either limitedin scope or have low performance (low true positive rate)and high computation overhead. An alternate approach toaddress the stabilization issue for a fairly long video (at leasta couple of minutes) [12] is to use a large number of framesfor computing the ﬁngerprint. The idea being that with alarge number of frames, there will be sufﬁcient number ofaligned pixels at each spatial location that can result in thecomputation of an accurate ﬁngerprint. As discussed above,this approach however, can again introduce high computationoverhead unsuitable for practical use.As a third example, modern devices such as smartphonescapture different types of media with different resolutions. Forexample, most cameras don’t use the full sensor resolutionwhen capturing a video and downsize the sensor output to alower resolution by proprietary and often unknown in-cameraprocessing techniques. For such a challenging task PRNUbased source camera matching may often fail if only I-framesare used.This paper proposes a computationally efﬁcient way tocompute a camera ﬁngerprint from a large number of mediaobjects, such as individual frames of a video or a largenumber highly compressed images taken from a social mediaplatform. In contrast to the two-step conventional ﬁngerprintcomputation method (which ﬁrst estimates PRNU noise fromeach frame using a denoising ﬁlter and then averages severalestimated individual PRNU noise estimates to get a reliableﬁngerprint estimate), the proposed method uses a three step ap-proach: frame averaging, denoising, and noise averaging. Theframe averaging step gets the arithmetic mean of the frames inspatial domain, resulting in a

Spatial Domain Averaged frame(SDA-frame) (Figure 2). Then, in the second step each SDA-frame is denoised, and an averaging of the estimated PRNUnoise is done to arrive at the ﬁnal ﬁngerprint estimate. Thegoal here is to minimize the number of denoising operations(as denoising is most expensive step), and also get rid of scenedependent noise by averaging multiple frames. Experimentswith VISION dataset [13] and NYUAD-MMD [14] show thatthe proposed method provides signiﬁcant speed up in comput-ing accurate ﬁngerprints. It achieves signiﬁcantly higher truepositive rate than a ﬁngerprint computed by I-frames only andmuch lower computation cost than a ﬁngerprint obtained fromall available frames while yielding similar performance.The rest of the paper has been organized as follows.Section II summarizes the PRNU-based method and providesan overview of how digital video stabilization works. Sec-tion III explains the proposed ﬁngerprint extraction methodusing SDA-frames as well as an analysis comparing it withthe conventional approach. The insights obtained from theanalysis are experimentally validated in Section IV. Section V examines applications for which SDA-frames based techniquecan be used and reports the improvement that can be achievedusing an SDA-based method for those cases. Section VIsection provides a discussion on future work and concludesthe paper.II. B

ACKGROUND AND R ELATED W ORK

In this section, we provide a brief review of PRNU-basedsource camera attribution and video stabilization.

A. PRNU-based Source Camera Attribution

PRNU-based camera attribution is established on the factthat the output of the camera sensor, I , can be modeled as I = I (0) + I (0) K + ψ (1)where I (0) is the noise-free still image, K is the PRNU noise,and ψ is the combination of additional noise, such as readoutnoise, dark current, shot noise, content-related noise, andquantization noise. The multiplicative PRNU noise pattern, K , is unique for each camera and can be used as a cameraﬁngerprint which enables the attribution of visual media to itssource camera. Using a denoising ﬁlter F (such as a Waveletﬁlter) on a set of images (or video frames) of a camera, wecan estimate the camera ﬁngerprint by ﬁrst getting the noiseresidual, W k , (i.e., the estimated PRNU) of the k th imageas W k = I k − ˆ I (0) k , ˆ I (0) k = F ( I k ) , and then averaging thenoise residuals of all the images. For determining if a speciﬁccamera has taken a given query image, we ﬁrst obtain thenoise residual of the query image using F and then correlatethe noise residual with the camera ﬁngerprint estimate.For images, the PRNU-based method has been well studied.Following the seminal work in [1], much research has beendone to improve the scheme [15]–[19], and also make cameraidentiﬁcation effective in practical situations [2], [3], [5], [6],[20]. Researchers have also studied the effectiveness of thePRNU-based method by proposing various counter forensicsand anti-counter-forensics methods [21], [22] It has alsoshown that the PRNU method can withstand a multitude ofimage processing operations, such as cropping, scaling [23],compression [24], [25], blurring [24], and even printing andscanning [26].In contrast, there has been lesser work dedicated to PRNU-based camera attribution from a video [27]. Mo Chen etal. [28] ﬁrst extended PRNU-based approach to camcordervideos. They used Normalized Cross-Correlation (NCC) tocorrelate ﬁngerprints calculated from two videos, as the videosmay be subject to translation shift, e.g., due to letter-boxing.To compensate for the blockiness artifacts introduced by heavycompression (such as MPEG-x and H26-x compression), theydiscard the boundary pixels of a block (e.g., a JPEG block).In [29], McCloskey proposed a conﬁdence weighting schemethat can improve PRNU estimation from a video by mini-mizing the contribution from regions of the scene that arelikely to distort PRNU noise (e.g., excluding high-frequencycontent). Chuang et al. [7] studied PRNU-based source cameraidentiﬁcation problem with a focus on smart-phone cameras.Since smart-phones are subject to high compression, they EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 3 considered only I-frames for ﬁngerprint calculation and corre-lation. Chen et al. [9] proposed a method to ﬁnd PRNU noisefrom wireless streaming videos, which are subject to blockingand blurring. In their approach, they divided a video frameinto multiple blocks and did not consider the blocks havingsigniﬁcant blocking or blurring artifacts. Chaung et al. [7]showed that the best possible ﬁngerprint could be computedwhen all the frames are considered (instead of using only the I-or P-frames). However, to the best of our knowledge, efﬁcientcomputation of ﬁngerprint from a given video is a relativelyunexplored area.

B. Afﬁne Transformation in Video Stabilization

Fig. 1: Video Stabilization Pipeline. This ﬁgure is a modiﬁedversion of a ﬁgure that appeared in [30].An out-of-camera digital video stabilization process con-tains three major stages: camera motion estimation, motionsmoothing, and motion correction (Figure 1) [31] [30]. Inthe motion estimation step, the global inter-frame motionbetween adjacent frames of a non-stabilized video is modeledfrom the optical ﬂow vectors of the frames using an afﬁnetransformation. In the motion smoothing step, unintentionaltranslations, rotations, shearing, are ﬁltered out from the globalmotion vectors using a low pass ﬁlter. Finally, in the motioncorrection step, stabilized video is created by shifting, rotating,shearing, or zooming frames according to the parameters inthe ﬁltered motion vector. Since each video frames can usedifferent parameters, pixels can be misaligned with the sensorarray. For example, one frame can be rotated with an angle -1degree while another by 0.5 degrees.Digital video stabilization presents a big challenge forPRNU-based camera attribution. The frame speciﬁc afﬁnetransformations described above make the PRNU methodineffective as there is misalignment between frames. Thebrute-force methods [10], [22] proposed to address the sta-bilization issue have had limited success and resulted inlow performance. These brute-force methods try to overcomethe desynchronization issue by ﬁrst ﬁnding the stabilizationparameters through an exhaustive search and then performingthe corresponding inverse afﬁne transformation. Such methods,therefore, have very high computation overhead. Recently,Mandelli et al. [11] improved over brute-force approaches by using a best-ﬁt reference frame in the parameter searchingprocess rather than using the ﬁrst frame of the given video.The best-ﬁt reference frame is obtained by looking for a framethat matches with the largest number of frames. Their approachalso has high computation overhead.III. S

PATIAL D OMAIN A VERAGING

As mentioned in the introduction, this paper proposes spatialdomain averaging for computing camera ﬁngerprints, whichreduces the number of denoising operations when many visualobjects are available. In the proposed method, efﬁcient com-putation of a ﬁngerprint is achieved by ﬁrst creating averagedframes from a large collection, and using these averagedframes for computing the ﬁngerprint. For example, given avideo with m frames, g non-intersecting equal-sized subgroupsare formed each with d = mg frames. A Spatial DomainAveraged frame (SDA-frame) is created from each subgroup bygetting the mean of the d frames in the subgroup. Then, in thesecond step, each SDA-frame is denoised, and an averagingof the estimated PRNU noise patterns is done to arrive at theﬁnal camera ﬁngerprint estimate. In this manner, the numberof frames that are denoised gets reduced by a factor of d . AnSDA-frame obtained from three different images is shown inFigure 2. (a) 1st (b) 2nd (c) 3rd (d) SDA-frame Fig. 2: SDA-frame is the average of st , nd , and rd frames.The proposed method is inspired by the fact that althoughthe denoising ﬁlter is designed to remove random noise froman image originating from the camera sensors (e.g., readoutnoise, shot noise, dark current noise etc.), as well as noisecaused by processing (e.g., quantization and compression), itis not able to do a perfect job. Therefore, some scene contentleaks into the extracted noise pattern. Averaging in the spatialdomain acts as a preliminary ﬁlter that smoothens the imageand potentially reduces the content noise that leaks into theextracted noise pattern. Of course, the effectiveness of theapproach then depends on the nature of the two noise signals.Below we analyzed this fact and characterized the relationshipbetween the noise signal arrived at by using the conventionalapproach and the SDA-approach.Further, when using the proposed approach, many questionsarise. First, does frame-averaging lead to a drop in the accuracyof the ﬁngerprint computed as compared to the conventionalmethod, assuming the same number of images are used forboth? If so, what is the trade-off between the decrease incomputation and the loss in the accuracy? Can accuracy beincreased by utilizing more images in the SDA method? If so,what is the optimal combination of averaging and denoisingthat leads to the least computation while yielding the bestperformance? Then, we investigated these questions, both the-oretically and experimentally. We ﬁrst provide a mathematical EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 4 analysis using a simple framework in the two subsectionsbelow. We then validate our study in the next section byproviding experimental results. The results show that spatialdomain averaging strategy can indeed result in signiﬁcantsavings in computation while maintaining performance andin some cases, improving it.The rest of this section provides an analysis of spatialdomain averaging. To this end, we ﬁrst provide an analysis ofthe conventional method and then analyze the SDA method.

A. Conventional method

As discussed in Section II, in the conventional method, thecamera ﬁngerprint is estimated from n images from a knowncamera. Each image I can be modeled as I = I (0) + I (0) K + ψ ,where ψ is the random noise accumulated from a variety ofsources (as in (1)) and K is the PRNU noise.To estimate K , a denoising ﬁlter, F , such as [32], BM3D[33], is used to estimate the noise free signal I (0) . Using sucha ﬁlter, we denote the noise residual as W = I (0) K + ψ + ξ ,where ξ is the content noise. This noise is essentially dueto sub-optimal denoising ﬁlter that is unable to completelyeliminate the content from PRNU noise. Then, from the n known image, the camera ﬁngerprint estimate, ˆ K , can beobtained using Maximum Likelihood Estimation ( MLE ) as ˆ K = (cid:80) ni =1 W i .I i (cid:80) ni =1 I i (2)where W i is noise pattern extracted from I i .Note that in the estimated camera ﬁngerprint, ˆ K , ψ and ξ are the unwanted noise. The quality of ˆ K can be assessedfrom its variance V ar ( ˆ K ) [34]. The lower the variance is (i.e.,images with smooth content), the higher the quality becomes.Assuming that ψ and ξ are independent White Gaussian Noisewith variances σ and σ respectively, V ar ( ˆ K ) can be foundas (using Cramer-Rao Lower Bound as shown by Fridrich etal. [34]) V ar ( ˆ K ) ≥ σ + σ (cid:80) ni =1 I i . (3)Thus a better PRNU is obtained from lower σ and σ (i.e.,high luminance and and low textured image [34]). B. Proposed SDA method

In this subsection, we derive the variance of the estimatedcamera ﬁngerprint obtained using frame averaging. We thencompare this variance with that obtained by the conventionalapproach (in (3)).Suppose I , I , . . . , I m are m images used to compute thecamera ﬁngerprint using SDA method. With frame averaging,these m images are divided into g = md disjoint sets of equalsize with d pictures in each set. From each set, an SDA-frame is computed. Thereafter, the process is similar to theconventional approach. Each SDA-frame is denoised, and thecamera ﬁngerprint is computed from g noise residuals usingMLE. Suppose, I SDAi is the SDA-frame obtained from the i th image set. Then I SDAi = (cid:80) idj =( i − d +1 I j d = (cid:80) idj =( i − d +1 ( I (0) j + I (0) j K + ψ j ) d We can write the above equation as I SDAi = I (0) ,SDAi + I (0) ,SDAi K + ψ SDAi , (4)where I (0) ,SDAi is the noise free image, and ψ SDAi is therandom noise (from pre-ﬁltering sources) in the SDA-frame.This noise can be written as ψ SDAi = (cid:80) idj =( i − d +1 ψ j d . Suppose σ is the variance of ψ ’s (which is assumed to beWhite Gaussian Noise). Then, the variance of ψ SDAi turns outto be σ d .Suppose W SDA is the noise residual of each SDA-frame, I SDA . Then, W SDA = I SDA − F ( I SDA )= I (0) ,SDA K + ψ SDA + ξ (cid:48) , where F is the denoising ﬁlter, and ξ (cid:48) = I (0) ,SDA − F ( I SDA ) is the content noise due to the sub-optimal nature of thedenoising ﬁlter. Note that ξ (cid:48) is assumed to be independent ofPRNU signal I (0) ,SDA K (although ξ (cid:48) contains content layover I (0) ,SDA − F ( I SDA ) as ξ (cid:48) is negligible compared to I SDA K [34].We know that ξ (cid:48) is dependent on the smoothness of theSDA-frames. If the frames contain textured content, ξ (cid:48) is high.Assuming that SDA-frames have similar smoothness to theinput frames from which they are created, we consider that ξ (cid:48) and ξ have the same variance σ .Using MLE, the camera ﬁngerprint can now be estimatedfrom g SDA-frames I SDA , I SDA , . . . , I SDAg as ˆ K SDA = (cid:80) gi =1 W SDAi .I SDAi (cid:80) gi =1 (cid:0) I SDAi (cid:1) . Using Cramer-Rao Lower Bound, the variance of the esti-mated ﬁngerprint ˆ K SDA becomes

V ar ( ˆ K SDA ) ≥ σ d + σ (cid:80) gi =1 (cid:0) I SDAi (cid:1) . (5)In an ideal case, we want that the averaging operationdoes not degrade the quality of the estimated PRNU fromthe SDA-frames. In other words, we want that V ar ( ˆ K SDA ) is approximately equal to the variance from the conventionalmethod V ar ( ˆ K ) . That is, in other words, using the resultsfrom (3) and (5), it is desired that σ d + σ (cid:80) gi =1 (cid:0) I SDAi (cid:1) ≈ σ + σ (cid:80) ni =1 I i . EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 5

By simplifying the above equation, we get σ d + σ σ + σ ≈ (cid:80) gi =1 (cid:0) I SDAi (cid:1) (cid:80) ni =1 I i . Suppose (cid:80) gi =1 ( I SDAi ) (cid:80) ni =1 I i = gn × k where k = ( (cid:80) gi =1 ( I SDAi ) ) /g ( (cid:80) ni =1 I i ) /n . Note that the value of k is a temporary variable that is less thanor equal to as the numerator (cid:80) gi =1 ( I SDAi ) ) /g is less thanequal to the denominator (cid:80) ni =1 I i ) /n . Putting these values inthe above equation, we get gn × k ≈ σ d + σ σ + σ . Putting g = md in the above equation, we get m × kd × n ≈ σ + d × σ d × ( σ + σ ) . or, m ≈ nk × σ + d × σ σ + σ (6)We then discard the temporary variable, k , from the equa-tion. Since < k ≤ , the ﬁnal equation becomes m ≤ n × (cid:16) σ + d × σ σ + σ (cid:17) (7)From (7), we can derive the following concluding remarks: • Since d ≥ , the right hand side of the equation isat least . Therefore, the number of images requiredin the proposed SDA method (i.e., m ) will be morethan or equal to the number of images required in theconventional method (i.e., n ). • For smooth images σ is close to zero. So, the impactof SDA-depth, d , will be negligible for such images.Therefore, SDA and conventional approaches will havesimilar performance. However the SDA technique will be d times faster in the best case. • For textured images, when the number of for both tech-niques is equal (i.e., m = n ), because σ is greater thanzero, conventional approach is expected to outperformSDA approach. • Since σ is greater than zero for textured images, the ratioof images for SDA- divided by conventional approach, mn , will increase as the SDA-depth, d , increases. There-fore, SDA approach will require more images to achievesame performance for textured images.Notice that it is hard to characterize the relationship of σ and σ , also σ depends on various factors such as shot noise,exposure time, temperature, illumination, image content andso on. Therefore, we are not focusing on their relationshipin this research. In the following section, we experimentallyvalidate the observations listed above. IV. V ALIDATION OF ANALYSIS

In this section, we experimentally verify the main conclu-sions arrived at by the analysis performed in the previoussection. In our experiments we use both ﬂatﬁeld and texturedimages from the VISION dataset [13]. The implementationswere done using Matlab 2016a on Windows 7 PC with 32GB memory and Intel Xeon(R) E5-2687W v2 @3.40GHzCPU. The wavelet denoising algorithm [32] was used to obtainﬁngerprint and PRNU noise. PCE and NCC methods wereused for comparison. A preset threshold of 60 [35] was usedfor PCE values. Values higher than this threshold were takento conclude that the two media objects originated from thesame camera.

A. Studying the effect of smoothness

To verify the observations of the analysis related to smooth-ness of the images used to compute a camera ﬁngerprint, werandomly selected 50 ﬂatﬁeld images and textured imagesfrom each camera in the dataset. For each of these types, ﬁveexperiments were conducted by using a random set of , , , , and images for computing the ﬁngerprint. So forexample, when we chose ﬂatﬁeld images, we created oneﬁngerprint using the conventional approach by denoising eachof the 30 images and then averaging the PRNU noise patternsto arrive at the ﬁngerprint estimate. Then a ﬁngerprint estimateusing the SDA approach was computed by averaging the same images in the spatial domain ﬁrst and then denoising thisSDA-frame of depth to directly arrive at another ﬁngerprintestimate. Therefore, a total of ﬁngerprints were obtainedfor each camera ( types of images; ﬁngerprint extractiontechniques; different cardinalities of image sets used forﬁngerprint computation).Each of these two ﬁngerprints was correlated with thePRNU noise obtained from the rest of the images in the datasettaken with the same camera. This set consisted of both texturedand ﬂat-ﬁeld images. To create an abundance of test cases, wedivided each full resolution ﬁngerprint into × disjointblocks and correlated them with the corresponding blocks inthe test images to match the PRNU noise. As a result, a totalof , comparisons were made. Fig. 3: The effect of texture in terms of PCEFig. 3 shows how image content affects the PCE for ﬁnger-prints obtained from , , , or ﬂatﬁeld and texturedimages. The ﬁgure shows that with ﬂatﬁeld images, despite thesigniﬁcantly lower number of denoising operations performed EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 6 by the SDA approach, the results obtained are similar to theconventional approach. This observation holds regardless ofthe number of images averaged for ﬁngerprint extraction. Theperformance of the SDA approach drops for textured images.However, this difference can be overcome by increasing thenumber of images used for SDA technique but still keeping thenumber of denoising operations lower than the conventionalapproach. We investigate this issue in the next subsection.If we consider the above results in terms of TPR, the SDAapproach starts doing better as the PCE is thresholded arounda set value (60 in our case) to arrive at the attribution result. Soa drop in PCE does not necessarily result in a wrong decision.This improvement can be observed in Fig. 4 which shows TPRfor the same experiments when the threshold is set to asproposed in [35]. The other implications of these ﬁgures arealready well-known in the ﬁeld (i.e., ﬂatﬁeld images are betterthan textured and as the number of images increase quality ofﬁngerprint also increases which results in a higher PCE andTPR.) Fig. 4: The effect of texture in terms of TPRTable I shows the average time it takes to extract aﬁngerprint estimate by the two methods in the above experi-ment. Notice that in both cases the same number of images, m , are read from the disk but for the SDA technique onlyone denoising operation is needed whereas for conventionalway, m denoising operations are done. This implies that asthe training images increase, the speedup also increases. Aspeedup of . times can be achieved by averaging imagesbefore denoising.TABLE I: Average time to extract ﬁngerprints with proposedand conventional methods (in sec) SDA 4.97 5.99 8.22 10.35 14.49Conventional 21.57 40.81 79.96 118.79 196.59Speedup 4.34 6.81 9.73 11.48 13.57

B. Fingerprint equivalence for textured images

For textured images, our analysis indicated that more imagesare needed by the SDA method and hence a correspondingreduction in the speedup obtained would occur. In this ex-periment, our goal is to investigate the relationship betweenthe number of images required by SDA compared to thenumber needed by the conventional approach to yield similar performance for textured images while still retaining a speed-up in ﬁngerprint computation. This experiment was againperformed using images from the VISION dataset [13].We created a training set from textured images for eachcamera in the VISION dataset. ﬁngerprints were createdusing , , . . . images using the conventional approach.We also created ﬁngerprints using SDA method using , , . . . images. As done in the previous experiment, eachﬁngerprint was partitioned into disjoint × blocks andcorrelations were computed with the corresponding blocks ofthe test PRNU noise pattern. Fig. 5: Fingerprint equivalence for SDA and conventional ap-proaches. x-axis indicates number of images for conventional.The left of y-axis (red) is the number of images required forSDA and the right one (blue) is the speedup gained in thiscase.Figure 5 shows the number of images required by the SDAapproach to achieve at least the same TPR as the conventionalapproach. Moreover, it shows the speedup gained in thesecases. For example, when ﬁngerprint is created from textured images using conventional way, the same TPR canbe achieved using images in SDA approach. In this way,the ﬁngerprint extraction is approx. . times faster for SDAapproach. The ﬁgure shows that using − times more imagesfor SDA method, up to times speedup can be achieved withno loss in TPR when the images are textured. C. Effect of SDA-depth on image ﬁngerprint

In Section III, we have shown that as the SDA-depth in-creases, when the number of images for ﬁngerprint extractionis constant, the TPR is expected to drop. To verify this remark,we used textured images for ﬁngerprint extraction. Wedidn’t include any ﬂatﬁeld image in this set as ﬂatﬁeld imagesresults in a negligible difference in performance between SDAand conventional ﬁngerprints.We then created ﬁngerprints using textured images fromeach camera in the VISION dataset. We set SDA-depth to , , , , and . Therefore, we created , , , , , and SDA-frames, respectively. The SDA-frames were de-noised and then averaged to arrive at the ﬁnal ﬁngerprintestimate. For each ﬁngerprint estimate computed, the rest ofthe images were used as test images. We correlated each

EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 7 ﬁngerprint with the PRNU noise extracted from the test imagesin a block-wise manner as done in previous experiments.Notice that

SDA − is the same as conventional approach. SDA- SDA- SDA- SDA- SDA- SDA- PCE 652.8 514.6 390.0 332.2 285.0 252.4TPR 0.80 0.78 0.75 0.72 0.69 0.67

TABLE II: SDA-depth vs TPR and PCE, change with ﬁgureTable II shows that as the SDA-depth increases, the averagePCE decreases. For textured images, the more images wecombine to create an SDA-frame, the lower the PCE and TPRvalues that will result. This supports the third observation ofthe analysis in Section 3.This section has provided a validation of Section III byexperimentally supporting all three observations derived fromthe analysis. Namely, when images are not textured, henceresulting in low post-ﬁltering noise, both the SDA and con-ventional ﬁngerprints from the same images perform similarlywhich can lead to . times speedup. On the other hand,textures images and larger SDA-depth result in requiringhigher number of images to achieve the same performanceas conventional approach. Yet, a speedup by a factor of canstill be achieved in most cases.In the next section, we apply the proposed approach to prac-tical problems, and show that SDA ﬁngerprints can performwith a signiﬁcantly higher accuracy or result in signiﬁcantspeedup compared to state-of-the-art ﬁngerprint extractiontechniques.V. A PPLICATION TO COMPUTING VIDEO FINGERPRINTS

In this section, we investigate a more practical use case ofthe proposed SDA technique which is its usage for extractingFE from videos. As Section II explains, two of the mostcommon ways to extract a ﬁngerprint from a video are usingonly I-frames or using all frames (or the ﬁrst n frames).While the former results in low performance, the latter canbe impractical in many real life applications due to veryhigh computational needs. For example, ﬁngerprints from − minute videos (i.e., approximately frame per video)using a single-thread may take up to a day to compute. In thissection, we provide experimental results that demonstrate howusing the SDA approach can provide signiﬁcant improvementsin the time needed for computing ﬁngerprint estimates fromvideo, while retaining the same performance obtained usinga signiﬁcantly larger number of denoising operations usingconventional approaches.In each experiment below, three different types of ﬁnger-prints (i.e., I-frames only, SDA-frames and ALL-frames) wereobtained from each video. For the sake of simplicity, we referto them as I-FE (i.e.,

Fingerprint Estimate ), SDA-FE , and

ALL-FE , respectively. Moreover, in some cases, we add anindication of the SDA-depth when we need to highlight it.For example, SDA-50-FE indicates that the video frames weredivided into groups of and each group averaged to createan SDA-frame.In the ﬁrst experiment, we examine source matching forvideos. That is given two videos, can we determine if they are from the same camera. Next we investigate a more difﬁcultcase that involves mixed media. In this subsection, we alsoanalyze an important question related to mixed media: “Whatis a good balance of SDA-depth which optimizes speed andperformance?”. In the next two subsections, we examine theperformance achieved with video and images obtained fromsocial media such as Facebook and YouTube. Finally, we showhow the proposed technique can be used for source attributionwith moderate length stabilized videos (i.e., up to minutes)from which obtaining a “reliable” FE might take couple ofhours each using all frames.Two datasets were used in all the experiments, the NYUAD-MMD, and VISION datasets. The NYUAD-MMD datasetcontains images and videos of different resolutions and aspectratios from cameras from different models and brands. Thismakes it a challenging dataset for mixed media attribution.Moreover, it contains stabilized videos longer than minutesfrom cameras. Hence, we used this dataset for experimentsusing mixed media and stabilized video. The videos in thedataset are typically around seconds( i.e., each video isapproximately frames) and images are pristine (i.e., noout-camera operations). The VISION dataset contains differ-ent high quality videos and images from social media suchas Facebook and YouTube. Hence, we used this dataset inexperiments involving social media. A. Matching Two Non-Stabilized Videos

In the ﬁrst experiment, we examine source matching forvideos using FE computed from the three different approachesthat have been presented. Our goal was to estimate the lengthof videos and the resulting computation time needed to achievegreater than

TPR for I-FEs, SDA-FEs and ALL-FEs. Thisway, a clear comparison of the the three approaches could bemade.FE from the non-stabilized videos of the same resolutionfrom the VISION dataset were ﬁrst created. FE were extractedfrom the ﬁrst , , . . . seconds of each video using the twotechniques mentioned in Section II and the proposed method.On average, each video had approximately one I-frame persecond. We selected an SDA-depth of resulting in an SDA-frame from each second of video.

5 10 15 20 25 30 35 400.40.50.60.70.80.91

Fig. 6: TPR for different lengths of video using I-FEs, SDA-FEs, and ALL-FEs

EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 8

Figure 6 shows TPR using I-FE, SDA-FE, and ALL-FEas the length of the videos increases. As seen, SDA-FEsoutperforms ALL-FEs in this setting for all video lengths. Thedifference varies between . (for sec videos) and . (for sec videos). Both FE achieve signiﬁcantly higher TPR thanI-FEs. For example, for seconds video, SDA-FEs and ALL-FEs result in . and . TPR, respectively, whereasI-FEs can only reach . TPR.The highest TPR achieved using I-FEs was . (i.e., for second videos) which is still lower than the TPR of SDA-FEs and ALL-FEs when they were computed from only -second videos (i.e., more than ). This is because SDA-FEand ALL-FEs use all the frames in a 5 second video (i.e.,I-, B- or P-frames) whereas the I-FEs use only I-frameson average and “waste” the rest of the frames. Hence, for thissetting, I-FEs fail to reach to a comparable accuracy as theother two methods.TABLE III: Time for video ﬁngerprint extraction in second type averaging I/O + denoising totalI-FE 0 50 50SDA-FE 12 50 62ALL-FE 0 1407 1407

We then estimated the time required for extraction of eachFE from a second Full HD video captured @30 FPS.Table III compares the average times for them. It takes , , and seconds for an I-FE, SDA-FE and ALL-FE, re-spectively. However, these times are when each one is obtainedfrom second videos. When we evaluate the required timeto achieve TPR, we need less than seconds of videofor SDA-FEs and ALL-FEs whereas I-FEs require secondsof video. This suggest that the required time for SDA-FEsand ALL-FEs are less than and seconds, respectively.Hence, SDA technique is at least times faster than I-FEsand requires times shorter videos, yet still achieves a higherTPR. Moreover, it performs up to . higher than ALL-FEsin terms of TPR and speeds up approximately . times inthis setting. Moreover, while SDA-FEs can achieve TPRwith seconds videos, the same can be achieved with seconds for ALL-FEs. Therefore, close to times speedupcan be achieve in this case when SDA-depth is set to .Notice that these results involve videos that did not undergoany processing such as scaling, compression in social mediaand so on. Also, all videos were taken with high luminancein the VISION dataset. Therefore, it is possible to have lowerperformance with more difﬁcult datasets such as when videosare dark or processed. However, our intention here was todemonstrate the effectiveness of SDA approach ﬁrst for thesimplest of cases. We examine more challenging situations infurther experiments below. B. Mixed Media Attribution

As we have seen in the previous subsection, using I-FEscauses a signiﬁcant drop in TPR whereas − secondsof video is enough to achieve more than TPR for bothSDA-FEs or ALL-FEs. In this subsection, we investigate amore challenging scenario where a video FE needs to bematched with a single query image. In [14], source attribution with mixed-media was investigated using the NYUAD-MMDdataset which is a very challenging dataset containing imagesand videos of various resolutions from of cameras. Here, weperformed “Train on videos and test on images” experimentfor I-FEs, SDA-FEs, and ALL-FEs. That is a camera FEwas computed from the video and the query image wascropped and resized and its PRNU matched with the FE. Theresizing and cropping parameters to perform the matchingwere obtained from the “Train on images, test on videos”experiment done in [14]).The videos in this dataset were typically around secondslong; each having approximately frames. The datasetcontains a total of non-stabilized videos and imagesfrom those cameras. Each video FE was correlated with thePRNU noise of all the test images from the same camera toestimate “true cases” which ended up with correlations.Then, each video FE from i th camera was compared with thePRNU noise of images from ( i + 1) th camera for resizing andcropping parameters that maximizes the PCE for the imageFE (i.e., the FE obtained from all images of the camera usingconventional approach). This way, we estimated the “falsecases” resulted in correlations.In the previous experiment we had used a ﬁxed SDA-depth, d , of . In this experiment we used different SDA-depths to investigate its impact on performance and speed.Given a video of m frames (in our case approximately frames), we divided the frames into groups of d =1 , , , , , , . Therefore, the number of SDA-frames, g , became , , , , , , respectively.When g = 1 , the technique becomes the same as using allframes whereas when p = 1200 , only a single SDA-frameis created by averaging all frames. After obtaining thePCE of the “true” and “false” cases, we created an ROCcurve for each video FE type/depth. Figure 7 shows theROC curves for each of the SDA-FEs of different depths,as well as I-FE and ALL-FE. The results show that ALL-FE results in the highest performance, whereas I-FE performsigniﬁcantly poorer compared to others. The proposed SDAmethod performs close to ALL-FE method for all depths.Fig. 7: The ROC curves for varying SDA-depthsTable IV shows more detailed results. | P CE | stands for theaverage of the PCE ratios with respect to I-FEs. For example,when an ALL-FE from i th video is correlated with the noise of j th image, its PCE is on average . times higher comparedto the I − F E obtained from the same video. The reason

EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9 we used such a normalization instead of average PCE is thatoutliers have a big impact on average PCE. Moreover, thetable shows the TPR for the PCE threshold of , averagetime to extract a FE, and the speedup compared to ALL-FEs.As seen, the results indicate that the TPR of SDA method arevery close to ALL-FE. However, a speedup of up to timescan be achieved using the SDA method.TABLE IV: Detailed information for mixed media attribution I- ALL- 5 10 30 50 200 1200 | P CE | % ) 64.0 speedup 28.1 1.0 5.1 9.9 22.7 29.3 44.0 Similar to the previous experiment using I-FEs have signif-icantly lower accuracy (at least lower TPR). Moreover,when SDA-depth ≥ , SDA-FEs are faster to extract as com-pared to I-FEs. Notice that when ALL-FEs are used, it takesapproximately ﬁve days to extract all the FEs from the videos in the NYUAD-MMD dataset using a single-threadedimplementation. This type of performance will clearly imprac-tical for many applications. C. Train and test on YouTube videos

This experiment explores the performance achieved whentwo video FEs from YouTube are correlated. Although thisexperiment is essentially the same as the Section V-A, it isrelevant in practice as high compression is involved. Notethat a key motivation of the SDA approach is that when highcompression is used, a large number of frames are needed forcomputing a reliable FE. We created FE from all non-stabilizedYouTube videos in VISION dataset (i.e., the ones labeledﬂatYT, indoorYT, and outdoorYT) using only I-frames, SDA- , SDA- , SDA- , and ALL-frames. Here, we used theﬁrst , , . . . seconds of the YouTube videos to extractFEs. Each second video had approximately framesthat were used for SDA- or ALL-FEs, whereas they contained . I-frames on average. After ﬁngerprint extraction, wecorrelated each video FE with others of the same type andsame length taken by the same camera. For example, an I-FEfrom seconds of video is correlated with all I-FEs obtainedfrom the rest of the seconds videos from the same camera.The same was done for SDA- and ALL-FEs. This way, a totalof correlations were done for each type.Figure 8 shows the TPR for varying lengths of video foreach FE type. The ﬁgure shows that I-FEs perform verypoorly for all cases and any FE type created from video ofmore than seconds outperforms I-FEs. While ALL-FEsperform better than SDA-FEs for the same-length videos, thisdifference can be overcome by increasing the video lengthbut still using much fewer denoising operations. For example,SDA − obtained from second videos or SDA − from seconds videos, perform approximately the same as ALL-FEs obtained from seconds (within + − TPR range).Hence, instead of using frames for ALL-FEs, using frames for SDA − can result in signiﬁcant speedup with noloss in TPR. While an ALL-FE from frame of a Full HD

10 20 30 40 50 6000.10.20.30.40.50.60.7

Fig. 8: The effect of FE type and video length on TPR forYouTube videosvideo takes seconds to compute, and SDA − FE from frames, which only does denoising instead of ,takes seconds to compute. Therefore, a speedup of close to times can be achieved with SDA − with increase inTPR. Notice that, because most videos are around secondsin the VISION dataset, it limits the maximum length we coulduse in our experiments. D. Train on Facebook images, test on YouTube videos

From the previous experiments, we know that the SDAmethod can help achieve a signiﬁcant speedup for both videosand images with a small loss in performance which can beovercome by increasing the number of still images used forﬁngerprint extraction if available. In this experiment, our goalwas to show that the proposed method can be successfullyapplied to other social media. Speciﬁcally, in this subsection,we extract FEs from Facebook images and match them withthe FE of YouTube videos. We call this the “Train on Facebookimages, test on YouTube videos” experiment. The importanceof this experiment is both media sharing services containbillions of visual media and computing ALL-FEs from thesecollections can have very high time complexity. Therefore,faster ﬁngerprint extraction methods (along with search tech-niques) that speeds up attribution are badly neededlIn this experiment, for the cameras in the VISION datasetthat had non-stabilized videos, we created a FE from

Facebook images (i.e., the ones labeled FBH) using con-ventional ﬁngerprint computation method. We then used theFEs from non-stabilized YouTube videos (those created in theprevious experiment). We again used I-frames, SDA- , SDA- , SDA- , and ALL-frames that were computed from theﬁrst seconds of YouTube videos. We then correlated theimage FE of a camera with the FE of each video of each typeusing the efﬁcient search proposed in [14] and a total of pairs were compared for each FE type. Table V shows theTPR of these correlations. Similar to “Train on videos, test onimages” experiment, these results show that for FEs obtainedfrom Facebook images matches with . TPR with theYouTube videos for SDA- which is higher than both ALL-FEs and I-FEs. On the other hand, FEs from I-frames yieldapproximately lower TPR. These results show that SDAapproach is a good replacement over using I-FEs or ALL-FEs EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 10 for this scenario.TABLE V: TPR of different FE types when a FE from Face-book images and another from YouTube videos are correlated

I-FE SDA- SDA-

SDA-

ALL-FETPR 51.60 81.4 79.88 78.13 79.59

E. Matching two stabilized videos

A recent work [12] has shown that a FE obtained froma long stabilized video can successfully be matched withother videos from the same camera. However, thousands offrames must be denoised. This may not be practical in manycircumstances. A potential alternative for this problem is theuse of SDA method which may lead to a signiﬁcant speedup. To evaluate this, we captured stabilized videos from cameras. A total of videos were captured which added upto minutes.We extracted FEs from the frames of , , . . . secondvideo lengths using conventional (I-frame and ALL-Frame)method as well as SDA method for SDA-depths of , , and . These depths were deemed to be reasonable choicesfrom previous experiments. As shown in [8], [10], [11],the ﬁrst frame of the videos are typically not geometricallytransformed. Since we divide video into pieces, some videopieces do not have an untransformed frame. So, we discardedthe ﬁrst frame of each video to avoid inconsistencies. Wecorrelated each FE with the other FEs of different videos fromthe same camera that are created using the same number offrames. For example, SDA − − FEs of second videosare correlated with the same type FEs from the same camera.Figure 9 shows the TPR for three cameras (i.e., HuaweiHonor, Samsung S8, and iPhone 6plus) and the total averageof all the ﬁve cameras. Fig. 9: TPR for stabilized videos for varying SDA-depthsThe results show that as videos get longer, ALL-FEsand SDA-FEs achieve higher TPR. Moreover, the effect ofincreased SDA-depth is more signiﬁcant for this case incomparison to non-stabilized videos. While for some camerasALL-FEs and SDA-FEs perform similarly (e.g., Huawei andSamsung cameras), for others (e.g., iPhone cameras) thereis a signiﬁcant difference between the two. For example,for Samsung S8

SDA − -FE from seconds video, perform similarly as seconds ALL-FE. Therefore, forthis particular case, SDA − can speedup times (cid:0) i.e. × (cid:1) (see Table IV for times). On the other hand foriPhone 6 plus, ALL-FEs from seconds video and seconds SDA − have similar TPR. Therefore, times (cid:0) i.e., × (cid:1) speedup can be achieved in this case. Hence,a speedup between these numbers (i.e. and ) can beachieved without any loss in TPR if a long video is available.Overall, this section shows that the proposed SDA-FEsoutperforms the commonly used I-frame-only technique in allthe cases for videos. These include mixed media, stabilizedvideos, and social media. On the other hand, the SDA-FEsachieves comparable results as ALL-FEs with up to timesspeedup in these experiments. We also show the impact ofSDA-depth on the performance that can be achieved in variouscases. VI. C ONCLUSION AND F UTURE W ORK

This paper has investigated camera ﬁngerprint extractionusing Spatial Domain Averaged frames, which are the arith-metic mean of multiple still images. By adding one extrastep of averaging before denoising, a signiﬁcant speedup canbe achieved for ﬁngerprint extraction. We show that thistechnique can successfully be used for images, non-stabilizedvideos as well as stabilized video to speedup ﬁngerprintextraction process. The proposed method is especially usefulwhen the number of denoising operations needed can be veryhigh. For example, when dealing with non-stabilized or highlycompressed stabilized videos or images from social media.It is often considered that for video source attribution, usingonly I-frames for ﬁngerprint extraction (I-FEs) is “enough” toachieve high performance. However, in this research, we haveshown that I-FEs performs poorly compared to ALL-FEs in allcases. On the other hand, using ALL-FEs is impractical dueto the large computation time needed for practical scenarioswhere thousands of videos can be available. The proposedSDA approach comes into play here to resolve the problem ofI-FEs (i.e., accuracy) and ALL-FEs (i.e., speed). Both SDA-and ALL-FEs perform similarly in most cases. When the SDAmethod performs worse, this can be overcome by using moreof the available frames if any.The proposed technique can be used for other sourceattribution related problems where many denoising operationsare needed. For instance, this method can be applied whenmany “partially misaligned” still images and a suspect cameraare available. For example, a seam carved video contains manypartially misaligned frames with its source camera. In sucha scenario, instead of denoising all frames of the video, theSDA technique can be used as a way to speed up this process.Moreover, determining whether a video is stabilized or not isanother issue which requires a number of denoising operations.As an alternative to using only I-frames, the proposed SDAtechnique could successfully work with only denoisingoperations.Another avenue for future research is to create an SDA-FE in a weighted manner such that performance achieve withSDA method can be increased. Two of the potential ways to EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 11 achieve this are weighting I-, P- and B- frames differently,and weighting the frames in a block-by-block manner. Forexample, it has been shown that ﬂatﬁeld images performbetter with SDA method compared to textured ones. Usingthis idea, one may weight textured regions differently fromsmooth regions. R

EFERENCES[1] J. Lukas, J. Fridrich, and M. Goljan, “Digital camera identiﬁcation fromsensor pattern noise,”

IEEE Transactions on Information Forensics andSecurity , vol. 1, no. 2, pp. 205–214, 2006.[2] M. Goljan, J. Fridrich, and T. Filler, “Managing a large database ofcamera ﬁngerprints,” in

Media Forensics and Security II , vol. 7541.International Society for Optics and Photonics, 2010, p. 754108.[3] S. Bayram, H. T. Sencar, and N. Memon, “Efﬁcient sensor ﬁngerprintmatching through ﬁngerprint binarization,”

IEEE Transactions on Infor-mation Forensics and Security , vol. 7, no. 4, pp. 1404–1413, 2012.[4] D. Valsesia, G. Coluccia, T. Bianchi, and E. Magli, “Compressedﬁngerprint matching and camera identiﬁcation via random projections,”

IEEE Transactions of Information Forensics and Security , vol. 10, no. 7,pp. 1472–1485, July 2015.[5] S. Bayram, H. T. Sencar, and N. Memon, “Sensor ﬁngerprint identiﬁca-tion through composite ﬁngerprints and group testing,”

IEEE Transac-tions of Information Forensics and Security , vol. 10, no. 3, pp. 597–612,March 2015.[6] S. Taspinar, H. T. Sencar, S. Bayram, and N. Memon, “Fast cameraﬁngerprint matching in very large databases,” in

Image Processing(ICIP), 2017 IEEE International Conference on . IEEE, 2017, pp. 4088–4092.[7] W.-H. Chuang, H. Su, and M. Wu, “Exploring compression effects forimproved source camera identiﬁcation using strongly compressed video,”in

Image Processing (ICIP), 2011 18th IEEE International Conferenceon . IEEE, 2011, pp. 1953–1956.[8] S. Taspinar, M. Mohanty, and N. Memon, “Source camera attributionusing stabilized video,” in

Information Forensics and Security (WIFS),2016 IEEE International Workshop on . IEEE, 2016, pp. 1–6.[9] S. Chen, A. Pande, K. Zeng, and P. Mohapatra, “Video source iden-tiﬁcation in lossy wireless networks,” in

IEEE INFOCOM , 2013, pp.215–219.[10] M. Iuliani, M. Fontani, D. Shullani, and A. Piva, “A hybrid approach tovideo source identiﬁcation,” arXiv preprint arXiv:1705.01854 , 2017.[11] S. Mandelli, P. Bestagini, L. Verdoliva, and S. Tubaro, “Facing deviceattribution problem for stabilized video sequences,”

IEEE Transactionson Information Forensics and Security , 2019.[12] J. Lubin, M. Isnardi, C. Spence, I. Sur, and A. Chaudhry, “Jointsensor ﬁngerprinting and processing history recovery for visual mediaforensics,” Private conversation, 2018.[13] D. Shullani, M. Fontani, M. Iuliani, O. Al Shaya, and A. Piva, “Vision:a video and image dataset for source identiﬁcation,”

EURASIP Journalon Information Security , vol. 2017, no. 1, p. 15, 2017.[14] S. Taspinar, M. Mohanty, and N. Memon, “Source camera attribution ofmulti-format devices.”[15] J. Luk´aˇs, J. Fridrich, and M. Goljan, “Digital camera identiﬁcationfrom sensor pattern noise,”

IEEE Transactions Information Forensicsand Security , vol. 1, no. 2, pp. 205–214, 2006.[16] Y. Sutcu, S. Bayram, H. T. Sencar, and N. Memon, “Improvements onsensor noise based source camera identiﬁcation,” in

IEEE InternationalConference on Multimedia and Expo , 2007, pp. 24–27.[17] C. T. Li and Y. Li, “Color-decoupled photo response non-uniformity fordigital image forensics,”

IEEE Transactions on Circuits and Systems forVideo Technology , vol. 22, no. 2, pp. 260–271, 2012.[18] G. Chierchia, S. Parrilli, G. Poggi, C. Sansone, and L. Verdoliva, “Onthe inﬂuence of denoising in PRNU based forgery detection,” in

ACMMultimedia in Forensics, Security and Intelligence , 2010, pp. 117–122.[19] C. T. Li, “Source camera identiﬁcation using enhanced sensor patternnoise,”

IEEE Transactions on Information Forensics and Security , vol. 5,no. 2, pp. 280–287, 2010.[20] W. Yaqub, M. Mohanty, and N. Memon, “Towards camera identiﬁcationfrom cropped query images,” in . IEEE, 2018, pp. 3798–3802.[21] S. Bayram, H. T. Sencar, and N. Memon, “Seam-carving basedanonymization against image & video source attribution,” in

IEEEWorkshop on Multimedia Signal Processing , 2013, pp. 272–277. [22] S. Taspinar, M. Mohanty, and N. Memon, “Prnu based source attributionwith a collection of seam-carved images,” in

Image Processing (ICIP),2016 IEEE International Conference on . IEEE, 2016, pp. 156–160.[23] M. Goljan and J. Fridrich, “Camera identiﬁcation from scaled andcropped images,”

Proc. SPIE, Electronic Imaging, Forensics, Security,Steganography, and Watermarking of Multimedia Contents X , vol. 6819,pp. 68 190E–68 190E–13, 2008.[24] E. J. Alles, Z. J. Geradts, and C. J. Veenman, “Source camera identiﬁca-tion for low resolution heavily compressed images,” in

ComputationalSciences and Its Applications, 2008. ICCSA’08. International Confer-ence on . IEEE, 2008, pp. 557–567.[25] K. Rosenfeld and H. T. Sencar, “A study of the robustness of prnu-based camera identiﬁcation,” in

Media Forensics and Security , ser. SPIEProceedings, E. J. Delp, J. Dittmann, N. D. Memon, and P. W. Wong,Eds., vol. 7254. SPIE, 2009, p. 72540.[26] M. Goljan, J. Fridrich, and J. Luk´aˇs, “Camera identiﬁcation from printedimages,”

Proceedings of SPIE

Signal Processing Systems , vol. 1, pp. 1–18, June 2012.[28] M. Chen, J. Fridrich, M. Goljan, and J. Lukas, “Source digital camcorderidentiﬁcation using sensor photo response non-uniformity,” in

SPIEElectronic Imaging , 2007, pp. 1G–1H.[29] S. McCloskey, “Conﬁdence weighting for sensor ﬁngerprinting,” in

IEEECVPR Workshops , 2008, pp. 1–6.[30] N. Ejaz, W. Kim, S. I. Kwon, and S. W. Baik, “Video stabilization bydetecting intentional and unintentional camera motions,” in

IEEE Inter-national Conference on Intelligent Systems, Modelling and Simulation ,2012, pp. 312–316.[31] Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H.-Y. Shum, “Full-frame video stabilization with motion inpainting,”

IEEE Transactionson Pattern Analysis and Machine Intelligence , vol. 28, pp. 1150–1163,July 2006.[32] M. K. Mihcak, I. Kozintsev, K. Ramchandran, and P. Moulin, “Low-complexity image denoising based on statistical modeling of waveletcoefﬁcients,”

IEEE Signal Processing Letters , vol. 6, no. 12, pp. 300–303, 1999.[33] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Bm3d im-age denoising with shape-adaptive principal component analysis,” in

SPARS’09-Signal Processing with Adaptive Sparse Structured Repre-sentations , 2009.[34] J. Fridrich, “Sensor defects in digital image forensic,”

Digital ImageForensics , pp. 1–43, 2013.[35] M. Goljan, J. Fridrich, and T. Filler, “Large scale test of sensorﬁngerprint camera identiﬁcation,” in