[PDF] A Modified Fourier-Mellin Approach for Source Device Identification on Stabilized Videos

Abstract

To decide whether a digital video has been captured by a given device, multimedia forensic tools usually exploit characteristic noise traces left by the camera sensor on the acquired frames. This analysis requires that the noise pattern characterizing the camera and the noise pattern extracted from video frames under analysis are geometrically aligned. However, in many practical scenarios this does not occur, thus a re-alignment or synchronization has to be performed. Current solutions often require time consuming search of the realignment transformation parameters. In this paper, we propose to overcome this limitation by searching scaling and rotation parameters in the frequency domain. The proposed algorithm tested on real videos from a well-known state-of-the-art dataset shows promising results.

Full PDF

AA MODIFIED FOURIER-MELLIN APPROACH FORSOURCE DEVICE IDENTIFICATION ON STABILIZED VIDEOS

Sara Mandelli ∗ , Fabrizio Argenti † , Paolo Bestagini ∗ , Massimo Iuliani †‡ , Alessandro Piva †‡ , Stefano Tubaro ∗∗ Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano - Milan, Italy † Department of Information Engineering, University of Florence - Florence, Italy ‡ FORLAB - Multimedia Forensics Laboratory - Prato, Italy

ABSTRACT

To decide whether a digital video has been captured by a given de-vice, multimedia forensic tools usually exploit characteristic noisetraces left by the camera sensor on the acquired frames. This analy-sis requires that the noise pattern characterizing the camera and thenoise pattern extracted from video frames under analysis are geomet-rically aligned. However, in many practical scenarios this does notoccur, thus a re-alignment or synchronization has to be performed.Current solutions often require time consuming search of the re-alignment transformation parameters. In this paper, we propose toovercome this limitation by searching scaling and rotation parame-ters in the frequency domain. The proposed algorithm tested on realvideos from a well-known state-of-the-art dataset shows promisingresults.

Index Terms — Video forensics, sensor noise, Fourier-Mellin,PRNU, video stabilization

1. INTRODUCTION

Multimedia forensics keeps developing technologies to identify thecamera originating a digital image or a digital video. Currently, themost promising technique is based on the analysis of Sensor PatternNoise (SPN) or Photo Response Non-Uniformity (PRNU), left bythe acquisition device into the visual content. This trace is usefulto identify the video source since it is universal (i.e., every camerasensor introduces one) and unique (i.e., PRNU from two differentsensors are uncorrelated) [1, 2]. Moreover, PRNU has proved to besigniﬁcantly robust to commonly used processing, like JPEG com-pression [1], or uploading to social media platforms [3, 4].PRNU-based source identiﬁcation process consists in verifyingthe match between a query image or video frame and a ﬁngerprintcharacterizing a reference camera. The strategy involves two mainsteps: i) a reference ﬁngerprint is derived from still images or videosacquired by the source device; ii) the query ﬁngerprint is estimatedfrom the investigated content and then compared with the referenceto verify the possible match, in form of a correlation. If the querycontent was acquired by the reference camera, then a high correla-tion is expected.

This material is based on research sponsored by DARPA and AFRL un-der agreement number FA8750-16-2-0173. The U.S. Government is autho-rized to reproduce and distribute reprints for Governmental purposes notwith-standing any copyright notation thereon. The views and conclusions con-tained herein are those of the authors and should not be interpreted as neces-sarily representing the ofﬁcial policies or endorsements, either expressed orimplied, of DARPA and AFRL or the U.S. Government. This work was sup-ported by the PREMIER project, funded by the Italian Ministry of Education,University, and Research within the PRIN 2017 program.

The previous scheme works under the hypothesis of perfect ge-ometrical alignment between the reference and test ﬁngerprints. Ifa geometrical transformation is applied to the query content, a pixelgrid misalignment between the query and the reference ﬁngerprintarises, thus hindering the detection. Such a case occurs in multiplescenarios: when an image or a video has been acquired with differ-ent resolution settings or it is cropped and resized due to the uploadin a social media; if a malicious user slightly distorts the content toremove the sensor traces; when a query video is tested against a ref-erence estimated from still images; when a video has been createdin presence of electronic image stabilization. In all these cases, thePRNU extracted from the query is misaligned with the reference ﬁn-gerprint, and thus a geometric re-synchronization between them hasto be carried out before the matching operation.The ﬁrst solution to this problem was proposed in [5], wherethe case of cropped and downscaled images was studied. The au-thors show that it is possible to parameterize the Normalized Cross-Correlation (NCC) between the reference ﬁngerprint and the querynoise with respect to the scaling factor. The NCC peak position fora given scaling factor provides an estimate of the shift. While theNCC can be efﬁciently computed in the Fourier domain, a bruteforce search is needed to determine the scaling factor. By follow-ing the same rationale, more recent papers [6, 7, 8] extend the pro-posed methodology also considering rotation and the more challeng-ing scenario of video analysis. As a matter of fact, modern acqui-sition pipelines usually include electronic stabilization that under-mines PRNU-based attribution technique. In these cases, PRNU-based techniques only work if geometric transformations are prop-erly estimated and compensated for, which is a computational com-plex operation.In this paper, we focus on the problem of camera attribution ofstabilized video sequences based on PRNU. Speciﬁcally, we pro-pose a method to align frame ﬁngerprints with the reference PRNUby recovering the scaling, shift and rotation parameters introducedby electronic stabilization. We overcome the problem of computa-tional complexity by searching for scaling and rotation parametersin the frequency domain thanks to a modiﬁed version of the Fourier-Mellin transform (FM). Results obtained on the well known Visiondataset [9] show that the proposed method provides extremely efﬁ-cient results whenever rotation and scaling operations are applied tovideo frames. When also shift is taken into account, the gain com-pared against the state-of-the-art [8] depends on the video content.

2. BACKGROUND AND PROBLEM STATEMENT

In this section we introduce the background on Fourier-Mellin (FM)transform and deﬁne the problem we are tackling in this paper. a r X i v : . [ c s . MM ] M a y ourier-Mellin Transform. The FM transform enables to es-timate scale, rotation and shift transformations between two imagesin closed form [10].Given an image I , the FM transform is expressed as the log-polarmapping of the magnitude of the image Fourier transform, i.e., FM { I } = LP {| F |} , (1)where LP {·} is the operator computing the log-polar mapping, and | F | is the magnitude of the Fourier transform.Let us consider two images I a and I b that are linked through asimilarity transformation, i.e., I a = T ab { I b } , where T ab applies thetransformation identiﬁed by the matrix T ab = (cid:20) s ab · cos α ab , − s ab · sin α ab , c ab x s ab · sin α ab , s ab · cos α ab , c ab y (cid:21) , (2)where s ab represents scaling, α ab rotation and c = [ c ab x , c ab y ] hori-zontal and vertical shift. In this scenario, it is possible to show that FM { I a } is a shifted version of FM { I b } . More formally, FM { I a } ( ρ, α ) = FM { I b } ( ρ − log s ab , α − α ab ) , (3)where ρ is the radial coordinate and α the rotational coordinate. It istherefore possible to estimate scale s ab and rotation α ab by looking atthe peak position of the phase correlation function between FM { I a } and FM { I b } independently from shift [10]. Once s ab and α ab areestimated, the two images can be realigned apart from translation.The relative shift can then be estimated by looking at the peak po-sition of the phase correlation computed between the two realignedimages in the pixel domain [10]. Problem formulation.

PRNU is typically modeled as a multi-plicative noise pattern introduced by any device in all acquired im-ages or videos [1, 11]. In the ﬁeld of forensics analysis, it is wellknown that PRNU can be exploited for inferring whether an imagewas shot by a certain device. For instance, given a test image I anda device PRNU K , we can compute the Peak-to-Correlation Energy(PCE) between the noise residual W extracted from the image andthe PRNU pixel-wise scaled by I , i.e., PCE ( W , K · I ) . Indeed, PCEmeasures the correlation between the noise traces left on I and thedevice PRNU independently of potential shift misalignment, as thecorrelation peak is searched over all possible mutual shifts betweenthem. If the PCE is greater than a conﬁdence threshold, we attribute I to the device [1, 11].The extension of PRNU-based strategies for attributing videoframes to a speciﬁc device suffers from some issues due, for in-stance, to higher compression rates and lower pixel resolutions. Asa matter of fact, the previously described PCE test cannot be di-rectly performed, being the PRNU resolution typically higher thanthe size of the recorded video frames. Moreover, in-camera videostabilization techniques, which are now becoming one of the must-have device speciﬁcations, strongly hinder the traces left by PRNU,as video frames may be warped by means of geometrical transfor-mations (e.g., cropping, rotation, scaling, etc.) in order to generate astable video sequence [8, 12]. As a consequence, the attribution ofa video frame to a speciﬁc device can represent a much more chal-lenging task than common image-camera attribution.In this paper, we exploit PRNU-based traces to investigate theproblem of device attribution when testing in-camera stabilizedvideo frames. Speciﬁcally, given a device ﬁngerprint K and a frame I coming from a stabilized video sequence, we aim at exploiting thePRNU traces left on I in order to detect whether it has been recordedby the analyzed device. To do so, we assume that geometric transfor-mations can be approximated by similarities [7, 8] and we propose a geometrical realignment strategy based on a modiﬁed version ofFM transform applied to both the device ﬁngerprint and the framenoise residual. Speciﬁcally, the proposed modiﬁed Fourier-Mellintransform (MFM) enables comparing a device ﬁngerprint and anoise residual independently from scaling and rotation operations.The next section provides all the details of the proposed method.

3. PROPOSED METHOD

In order to attribute a video frame I to a device whose referenceﬁngerprint is K , we follow a pipeline based on a few steps: (i) noiseextraction; (ii) geometric transformation estimation; (iii) geometriccompensation and matching. In the following, we illustrate all thesteps of the pipeline. Noise extraction.

As in the common PRNU-based attributionalgorithm, we extract the noise residual W from frame I . This isdone using the strategy proposed in [1, 11]: (i) the noise is extractedthrough wavelet-based denoising; (ii) a series of post-processingsteps (e.g., zero-averaging rows and columns, Wiener ﬁltering, etc.)are applied to further enhance the noise residual W . Geometric transformation estimation.

In order to match W and the ﬁngerprint K , we ﬁrst need to search for the geometricaltransformation that might link them. In principle, assuming thatvideo frames warping can be approximated by a similarity trans-formation [7, 8], aligning a noise residual and a reference deviceﬁngerprint by means of Fourier-Mellin may seem straightforward.In practice, differently from the Fourier-Mellin theory presented inSection 2, the two terms to compare (i.e., W and K ) are not exactlyone the transformed version (by means of a similarity transforma-tion) of the other. First, the geometric transformation introduced bystabilization is not necessarily a similarity, but can include perspec-tive distortions (on the entire frame or a localized portion of it) aswell [12, 13]. Second, the noise residuals of video frames may con-tain scene content and noise contributions which are not present inthe reference device ﬁngerprint.The primary consequence of this dissimilarity is that selectingonly the Fourier magnitudes for estimating scale and rotation be-tween the two terms, as reported in (3), may be not precise. Indeed,we veriﬁed that phase correlation between FM { K } and FM { W } does not show a pronounced peak, thus leading to a strongly hin-dered estimation of scale, rotation and shift. In order to overcomethis issue we modify the Fourier-Mellin pipeline in two ways.First, we propose to embed the phase term of the Fourier trans-form in addition to the magnitude to the Fourier-Mellin pipeline. Themodiﬁed Fourier-Mellin transform of I can be thus deﬁned as: MFM { I } = LP { F } , (4)where LP { F } is the log-polar mapping of the image Fourier trans-form (including magnitude and phase). On one hand, phase addsmore information, which is very useful for angle and scale estima-tion. On the other hand, this operation comes with a cost. The natu-ral drawback of this approach is that we cannot isolate anymore theestimation of scale and rotation from the estimation of the shift. In-deed, in this case, phase correlation does not exclusively depend onscale and rotation transformations, but also on translation betweenthe two terms. The Fourier-Mellin pipeline works only if W and K are almost perfectly aligned in terms of translation, i.e., if theirmutual shift is basically pixels in both horizontal and vertical di-rections. In other words, including the Fourier phase term, we ﬁrsthave to correctly realign the PRNU traces left on the noise residual ransformtransform Global optimization for shift estimation

Shift correction Phase correlation

Fig. 1 : Scheme of the proposed method for similarity estimation betweennoise residual W and reference device ﬁngerprint K . The global optimizersearches for shift candidates, while the proposed MFM ∆ ρ transform pro-vides an estimate of scale and rotation for each shift. with those on the reference ﬁngerprint for what concerns the rela-tive shift, then we can convert the Fourier transforms into log-polardomain and estimate the remaining parameters.The second proposed modiﬁcation helps enabling faster compu-tations. It has been shown that a properly selected portion of thePRNU frequency spectrum can be sufﬁcient to achieve good attri-bution performance (e.g., through subsampling [14]). In this vein,notice that a 2D frequency band becomes a rectangular band if thefrequency spectrum is converted in log-polar domain. We propose toliterally cut the frequency content of K and W by cropping the log-polar Fourier transform of ∆ ρ samples along the ρ dimension. Thecropping center corresponds to the coordinate of the highest energypeak of MFM { K } evaluated as a function of ρ . Despite this stepmight seem irrelevant, this strongly reduces the amount of frequencysamples to be correlated, thus lowering the computational cost. Wedeﬁne the modiﬁed Fourier-Mellin transform followed by croppingas: MFM ∆ ρ { I } = [LP { F } ] ∆ ρ . (5)By considering the added phase term and the frequency crop-ping step, the best similarity parameters can be estimated solving amaximization problem. Formally, ˆ s, ˆ α, ˆ c = arg max s ∈S ,α ∈A , c ∈C Φ [MFM ∆ ρ { W } , MFM ∆ ρ { K ( x − c ) } ] , (6)where Φ represents the phase correlation and vector x refers to hor-izontal and vertical pixel coordinates.Notice that, for each shift candidate value, scale and rotationparameters can be very quickly estimated in closed form throughphase correlation. Therefore, we only need to optimize over differentshift values. However, gradient descent strategies to solve (6) sufferfrom the non-convex behavior of phase correlation as a function ofthe shift. Especially in video sequences characterized by outdoorscenarios or user motion, the actual peak value can be hard to ﬁndwith gradient descent algorithms. The maximization problem as afunction of the shift can be solved by resorting to global optimizationtechniques. It is worth noting that the translation between W and K can be assumed with slight approximation to imply integer shift inhorizontal and vertical directions, i.e., to represent a certain numberof pixels. We propose to exploit a global optimization algorithmknown as genetic algorithm that allows an efﬁcient estimation ofinteger parameters [15]. In a nutshell, our method is shown in Fig. 1. Geometric compensation and matching.

After estimating thesimilarity transformation T ˆ s ˆ α ˆ c , last steps consist in: (i) applying T ˆ s ˆ α ˆ c to K in order to realign the PRNU traces left on K with thoseof W ; (ii) resorting to PCE as strategy for a correct source deviceidentiﬁcation. We compute P MFM as P MFM = PCE ( W , T ˆ s ˆ α ˆ c ( K )) . (7) As in standard PRNU attribution tests, by thresholding P MFM it ispossible to detect whether the frame under analysis belongs to thetested device. In case multiple frames are available, it is possible torepeat the whole procedure and fuse the results obtained with differ-ent frames (e.g., maximum PCE picking, majority voting, etc.).

4. EXPERIMENTAL ANALYSIS

In this section we report all the details about the performed experi-mental campaign and the achieved results.

Dataset.

Our datasets have been extracted from Vision dataset,which includes both images and videos from major brand de-vices [9]. For building the PRNU related to each device, we selectall the available images taken by the device depicting ﬂat scenes[11]. Then, each ﬁngerprint K is built by scaling and cropping thePRNU, using the image to video warping parameters reported in [8].Regarding video frames, we select only devices with Full-HD videoresolution (i.e., × pixels). For the sake of clarity, wemake use of the same device nomenclature presented in [9], creatingtwo test datasets: a non-stabilized dataset, selecting non-stabilizeddevices D03, D11, D17, D21, D24 from different brands, and a stabilized dataset that includes all the available stabilized devices.Notice that the considered video frames contain both static andmotion scenes, depicted as still, panrot, move in [9], and can includealmost ﬂat content as well as signiﬁcant texture presence, denoted as ﬂat, indoor, outdoor in [9]. In particular, we only make use of the I-frames, as the PRNU traces left on them are likely to be more reliablethan those left on inter-predicted frames [6, 16]. Furthermore, inlight of past investigations about the ﬁrst I-frame of stabilized videosequences, we always discard it from the experiments [8, 12]. MFM parameters.

To compute the

MFM transform, we eval-uate the 2D Fourier transform over × samples after zero-padding residue and reference ﬁngerprint in the pixel domain, inorder not to introduce undesired border effects. Then, we convertboth terms into log-polar domain, following the default parametersprovided by [17], ending up with MFM transforms having ρ -samples and α -samples. We veriﬁed that the sampling gridfor ρ and α dimensions allows a correct estimation of scaling factorand rotation angle. Eventually, we crop MFM transforms along ρ dimension according to the chosen number of samples ∆ ρ .The exploited genetic algorithm mimics biological evolution toﬁnd a reliable shift estimation. Precisely, it has the following pa-rameter conﬁguration: a population size of individuals, whichiteratively update the cost function for a maximum of iterations.Remaining parameters are those deﬁned in [15]. Performance in a controlled scenario.

In order to assessthe accuracy in attributing video frames to the correct device, weinvestigate the proposed method in a controlled scenario. Specif-ically, considering the non-stabilized dataset, we randomly select I-frames per device, taking care of equally distributing motionand static scenes, as well as ﬂat and textured content. We end upwith a total amount of video frames. In particular, we selectonly frames which report acceptable PCE values with the deviceﬁngerprint (i.e., PCE ≥ , as suggested in [6, 8]). Then, wewarp each frame by means of a similarity transformation, randomlyselecting the parameters from some realistic ranges [12], namely S = [0 . , . , A = [ − , deg, C = [ − , pixels, related toscale, rotation angle, horizontal and vertical shifts, respectively. Weveriﬁed these ranges include the vast majority of possible similar-ity transformations between stabilized video frames and reference b)(a) Flat Indoor OutdoorFlat Indoor Outdoor

Fig. 2 : Accuracy on synthetically warped non-stabilized video frames: (a) only scale and rotation are applied; (b) a complete similarity is applied. Theproposed method (orange) can be tuned to used different amount of frequencysamples ( ∆ ρ ), thus becoming slower but more accurate. ﬁngerprint.We aim at estimating the applied transformation using the pro-posed strategy, comparing the performance with the method pre-sented in [8]. Speciﬁcally, we exploit the same parameter conﬁg-uration for the particle swarm strategy [15, 18] as reported in [8],which enables to estimate the similarity transformation returning thehighest PCE between W and K . For what regards the search boundsof scale and rotation parameters, we suppose these to be known atinvestigation side, thus they coincide with S and A . Notice thatmethod [8] does not need to ﬁx bounds for the shift parameters asthese can be estimated without the need of optimization. Followingsimilar considerations, the proposed MFM strategy ﬁxes the searchrange for shift parameters exactly to C , while scale and rotation donot require optimization.Computational time and true positive rate evaluated for a PCEthreshold of (i.e., TPR [60] ) are the chosen accuracy metrics tocompare the two strategies. The average time for estimating the simi-larity transformation on each frame with the method [8] is s, while MFM strategy changes its temporal requirement depending on ∆ ρ (e.g, using ∆ ρ = 200 requires only s on average). Generally, therequired time linearly grows with ∆ ρ .Fig. 2 shows results as a function of the scene content of videoframes (i.e., ﬂat, indoor and outdoor). Speciﬁcally, Fig. 2 (a) re-ports results where only scale-rotation transformations were applied.The shift between noise residuals and K is assumed to be known.Fig. 2 (b) reports results where a complete similarity transformationhas been applied. It is worth noting that, in case the shift parameter isknown and only scale and rotation parameters should be estimated,our proposal can be a viable solution for very fast identiﬁcation.Since scale and rotation can be estimated without the need of op-timization, the computational time reduces to less than one second.The more the selected ∆ ρ samples, the better the accuracy of MFM strategy, which overcomes results of [8]. Furthermore, in this casethere is no need for global optimizers, thus the potential optimizationerror reduces to zero. In case (b) , MFM shows better or basicallyequivalent results to [8] for ﬂat and indoor scenarios, while outdoorframes seem to be more challenging for the proposed method.

Performance on stabilized videos.

In order to show the poten-tiality of

MFM approach in dealing with source device identiﬁcationproblem on real videos, we apply the proposed method to the stabi-

Fig. 3 : ROC curves obtained testing I-frames with the proposed strategy,as a function of the number of used frequency samples, i.e., ∆ ρ . Results arecompared to those of [8] evaluated using I-frames.

Table 1 : AUC and

TPR @0 . testing random I-frames per video query,together with average computational time per frame, evaluated with MFM and [8] methods.

Method MFM ∆ ρ =200 MFM ∆ ρ =400 MFM ∆ ρ =600 MFM ∆ ρ =800 [8] AUC 0 .

93 0 .

94 0 .

97 0 . @0 . .

85 0 .

86 0 .

90 0 .

94 0 . Time [s]

22 46 69 105 57 lized video sequences. Following previous considerations, we set assearch range for the mutual shift C = [ − , both in horizontaland vertical directions. For clarity’s sake, we use the very same ac-curacy metrics presented in [8], i.e., the area-under-the-curve AUC and

TPR @0 . of ROC curves, averaged over all devices. Precisely, TPR @0 . corresponds to the rate of correct attributions evaluatedwhen the false positive attribution rate is equal to . .We show the attribution results achieved by testing randomI-frames per video query and picking the maximum value among thecomputed PCEs. Speciﬁcally, we test different values for the num-ber of used frequency samples (i.e., ∆ ρ ) and always report resultsachieved by [8] over the same dataset. Fig. 3 draws the ROC curvesand Table 1 depicts the achieved AUC and

TPR @0 . as a functionof ∆ ρ . Moreover, last row of Table 1 reports the average requiredcomputational time [seconds] for testing one query frame accord-ing to the chosen strategy, considering matching cases as well asnon-matching ones. It is worth noticing that the proposed approachcan overcome results of [8], provided that a sufﬁcient amount of fre-quency samples is selected. Furthermore, the MFM strategy enablesfast computations as well, at the expense of a slightly reduced accu-racy, but still acceptable.

5. CONCLUSIONS

In this paper, we propose an alternative solution for solving thesource device identiﬁcation problem on stabilized videos. Speciﬁ-cally, we re-synchronize video frames and device reference ﬁnger-print by estimating the re-alignment transformation with a modiﬁedversion of the Fourier-Mellin transform. In doing so, we search thescaling and rotation parameters in the frequency domain, whereasunknown translations can be estimated leveraging global optimiza-tion strategies. Moreover, we propose to use a reduced amount ofFourier-Mellin transform samples to estimate the warping conﬁgu-ration, thus enabling fast computations.The experimental campaign is conducted on a publicly availabledataset. Results are promising and show enhanced performance withrespect to state-of-the-art. This is especially true in situations whereonly scale and rotation parameters should be estimated: experimentsperformed in a synthetic set-up reveal that the proposed method canbe much faster and accurate than existing methodologies. . REFERENCES [1] Jan Lukas, Jessica Fridrich, and Miroslav Goljan, “Digitalcamera identiﬁcation from sensor pattern noise,”

IEEE Trans-actions on Information Forensics and Security , vol. 1, no. 2,pp. 205–214, 2006.[2] Nicola Mondaini, Roberto Caldelli, Alessandro Piva, MauroBarni, and Vito Cappellini, “Detection of malevolent changesin digital video for forensic applications,” in

Security,steganography, and watermarking of multimedia contents IX .International Society for Optics and Photonics, 2007, vol.6505, p. 65050T.[3] Aniello Castiglione, Giuseppe Cattaneo, Maurizio Cembalo,and Umberto Ferraro Petrillo, “Experimentations with sourcecamera identiﬁcation and online social networks,”

Journal ofAmbient Intelligence and Humanized Computing , vol. 4, no. 2,pp. 265–274, 2013.[4] Flavio Bertini, Rajesh Sharma, Andrea Iannı, Danilo Montesi,and Mura Anteo Zamboni, “Social media investigations usingshared photos,” in

Proceedings of the International Confer-ence on Computing Technology, Information Security and RiskManagement (CTISRM) , 2016, p. 47.[5] Miroslav Goljan and Jessica Fridrich, “Camera identiﬁca-tion from cropped and scaled images,” in

Security, Foren-sics, Steganography, and Watermarking of Multimedia Con-tents X . International Society for Optics and Photonics, 2008,vol. 6819, p. 68190E.[6] Samet Taspinar, Manoranjan Mohanty, and Nasir Memon,“Source camera attribution using stabilized video,” in

IEEEInternational Workshop on Information Forensics and Security(WIFS) , 2016, pp. 1–6.[7] Massimo Iuliani, Marco Fontani, Dasara Shullani, andAlessandro Piva, “Hybrid reference-based video source iden-tiﬁcation,”

Sensors , vol. 19, no. 3, 2019.[8] Sara Mandelli, Paolo Bestagini, Luisa Verdoliva, and StefanoTubaro, “Facing device attribution problem for stabilized videosequences,”

IEEE Transactions on Information Forensics andSecurity , vol. 15, pp. 14–27, 2019. [9] Dasara Shullani, Marco Fontani, Massimo Iuliani, Omar AlShaya, and Alessandro Piva, “VISION: a video and imagedataset for source identiﬁcation,”

EURASIP Journal on Infor-mation Security , vol. 2017, no. 1, pp. 15, Oct 2017.[10] B Srinivasa Reddy and Biswa Nath Chatterji, “An fft-basedtechnique for translation, rotation, and scale-invariant imageregistration,”

IEEE transactions on image processing , vol. 5,no. 8, pp. 1266–1271, 1996.[11] Mo Chen, Jessica Fridrich, Miroslav Goljan, and Jan Luk´aˇs,“Determining image origin and integrity using sensor noise,”

IEEE Transactions on Information Forensics and Security , vol.3, no. 1, pp. 74–90, 2008.[12] Matthias Grundmann, Vivek Kwatra, and Irfan Essa, “Cas-caded camera motion estimation, rolling shutter detection, andcamera shake detection for video stabilization,” Feb. 6 2018,US Patent 9,888,180.[13] Zhongqiang Wang, Lei Zhang, and Hua Huang, “High-qualityreal-time video stabilization using trajectory smoothing andmesh-based warping,”

IEEE Access , vol. 6, pp. 25157–25166,2018.[14] Luca Bondi, Paolo Bestagini, Fernando Perez-Gonzalez, andStefano Tubaro, “Improving PRNU compression through pre-processing, quantization and coding,”

IEEE Transactions onInformation Forensics and Security (TIFS) , vol. 14, pp. 608–620, 2019.[15] Mathworks, “Global Optimization Toolbox - MATLABR2016a,” , 2016.[16] Wei-Hong Chuang, Hui Su, and Min Wu, “Exploring compres-sion effects for improved source camera identiﬁcation usingstrongly compressed video,” in

IEEE International Conferenceon Image Processing (ICIP) , 2011.[17] Mathworks, “Image Processing Toolbox- MATLABR2016a,” , 2016.[18] James Kennedy, “Particle swarm optimization,” in