[PDF] Provenance Filtering for Multimedia Phylogeny

Abstract

Departing from traditional digital forensics modeling, which seeks to analyze single objects in isolation, multimedia phylogeny analyzes the evolutionary processes that influence digital objects and collections over time. One of its integral pieces is provenance filtering, which consists of searching a potentially large pool of objects for the most related ones with respect to a given query, in terms of possible ancestors (donors or contributors) and descendants. In this paper, we propose a two-tiered provenance filtering approach to find all the potential images that might have contributed to the creation process of a given query q. In our solution, the first (coarse) tier aims to find the most likely "host" images --- the major donor or background --- contributing to a composite/doctored image. The search is then refined in the second tier, in which we search for more specific (potentially small) parts of the query that might have been extracted from other images and spliced into the query image. Experimental results with a dataset containing more than a million images show that the two-tiered solution underpinned by the context of the query is highly useful for solving this difficult task.

Full PDF

cc (cid:13) a r X i v : . [ c s . I R ] J un ROVENANCE FILTERING FOR MULTIMEDIA PHYLOGENY

A. Pinto , , D. Moreira , A. Bharati , J. Brogan ,K. Bowyer , P. Flynn , W. Scheirer and A. Rocha , Department of Computer Science and Engineering, Univ. of Notre Dame, IN, U.S.A. Institute of Computing, Univ. of Campinas, SP, Brazil

ABSTRACT

Departing from traditional digital forensics modeling, which seeks toanalyze single objects in isolation, multimedia phylogeny analyzesthe evolutionary processes that inﬂuence digital objects and collec-tions over time. One of its integral pieces is provenance ﬁltering,which consists of searching a potentially large pool of objects for themost related ones with respect to a given query, in terms of possibleancestors (donors or contributors) and descendants. In this paper,we propose a two-tiered provenance ﬁltering approach to ﬁnd all thepotential images that might have contributed to the creation processof a given query q . In our solution, the ﬁrst (coarse) tier aims to ﬁndthe most likely “host” images — the major donor or background —contributing to a composite/doctored image. The search is then re-ﬁned in the second tier, in which we search for more speciﬁc (poten-tially small) parts of the query that might have been extracted fromother images and spliced into the query image. Experimental resultswith a dataset containing more than a million images show that thetwo-tiered solution underpinned by the context of the query is highlyuseful for solving this difﬁcult task. Index Terms — Provenance Filtering; Multimedia Phylogeny;Phylogeny Graph; Provenance Context Incorporation.

1. INTRODUCTION AND RELATED WORK

Rather than focusing on checking the integrity of a single multime-dia object (as it used to be with most of the proposed methods fromthe early 2000s until recently), some researchers in digital forensicsare now seeking to leverage all possible information associated to apool of objects, analyzing their space and time relationships. Suchrecent efforts are made possible by a research ﬁeld known as Multi-media Phylogeny [3, 1] — a relatively new discipline that studies theevolutionary processes that inﬂuence multimedia objects and collec-tions, as well as the relationship among transformed versions of anobject, looking for causal and ancestry relationships, the types oftransformations, and the order in which they were applied to objects.Such new developments are necessary in order to adapt forensicsmethods to a rapidly evolving society. The increasingly frequent oc-currence of image and video compositions on the Internet and socialmedia render the applications of phylogeny very useful in practicalscenarios such as content tracking, forensics and copyright enforce-ment [3, 1]. Within this new reality, forensics analysts are interestednot only in determining if a digital object is fake or real but also

This material is based on research sponsored by DARPA and Air ForceResearch Laboratory (AFRL) under agreement number FA8750-16-2-0173.Hardware support was generously provided by the NVIDIA Corporation. Wealso thank the ﬁnancial support of FAPESP (Grant

Cropping + ResizingExposure + SaturationOriginal (a) Semantically-similar & near-duplicate images.

Potential Host (major donor)Donor (alien) Composition Donor (Alien)Donor (Alien) (b) Multiple parenting multimedia phylogeny setupwith an image composition and its several ancestors(donors).

Fig. 1 . Contrasting multimedia phylogeny applied to near duplicateimages (a) and image composites with several donors (b). Whilethe former focuses on ﬁnding relationships among images that havesimilar overall context, the latter aims at ﬁnding the genealogy of anasset, including all possible near duplicates of the composition itselfand of its donors. Example in (a) from [1]; example in (b) from theNIST Nimble 2016 dataset [2].in pinpointing who created it, what happened, when and how (ge-nealogy) an asset was created. This process might be of signiﬁcantimportance in the era of post-truth [4, 5, 6] for determining howa composition was crafted, what parts went into creating the com-posite, and whether there was re-staging, re-purposing or an overallchange of semantics [7].Nonetheless, before analyzing a pool of objects looking for pos-sible kinship relationships, we need to be able to comb through largequantities of data looking for the very pieces potentially associatedwith a given query q . This task needs to be performed prior to sub-sequent multimedia phylogeny steps — namely the pairwise imagedissimilarity calculations and the phylogenetic graph analysis andconstruction — and it is referred to herein as provenance ﬁltering .Most of the work thus far in multimedia phylogeny has over-looked the provenance ﬁltering task, considering it to be a reasonablywell solved problem [3, 1]. The rationale behind that assumptionwas that most phylogeny works focused on ﬁnding the evolution-ary processes associated with near-duplicate [3] and semantically-similar images [1]. In both setups, original images may undergoransformations over time but cannot have their overall semanticschanged. When we consider forged and composite images, we bringnew elements to the table. In this case, we now have the appear-ance of multiple parenting phylogeny [8], a setup in which an imagemight be the composite result of several other images, each with itsown evolutionary chain of modiﬁcations. The composite image it-self might also have its own chain of descendants and so on. Fig. 1(a)shows an example of semantically-similar images in which an orig-inal image might undergo several transformations and generate off-spring. Each child can also generate others. However, the transfor-mations tend to keep the overall meaning of the scene. In turn, aswe see in Fig. 1(b), an image in a multiple parenting setup might bethe result of combining several others, each of which having its ownchain of ancestors and descendants.Near-duplicate detection (NDD) methods [9, 10, 11, 12, 13]work properly for the task of ﬁnding semantically-similar images(Fig. 1(a)), upon which phylogeny graph construction algorithmscould operate later on. However, NDD methods might fail in thepresence of multiple donors (Fig.1(b)) given that the context andmeaning of each donor is too diverse to be represented and capturedby current methods. Moreover, each donor might undergo severaltransformations in the composition creation process including color,geometric, and afﬁne operations. For those cases, even partial near-duplicate detection methods could fail [14]. Likewise, traditionalcontent-based image retrieval (CBIR) methods [15] would not workdirectly either as they often aim to determine the overall meaningof the scene and its generalization to provide the user with similarimages respecting the principles of novelty and diversity [16].While related work for multimedia phylogeny abounds, priorwork on provenance ﬁltering is almost non-existent. In terms of phy-logeny, Dias et al. [3] presented a minimum spanning tree-based al-gorithm to ﬁnd a directed graph that represented the phylogeny treeof a group of near-duplicate images. This work was extended to dealwith images from multiple cameras and their near duplicates [1].Other media have also been considered such as videos [17, 18], au-dio [19] and text [20]. Oliveira et al. [8] extended the image phy-logeny formulation to deal with multiple donors and descendants si-multaneously more aligned with the context of this paper. However,their work assumes the candidate images are known a priori.Important advances have been made on ﬁnding ancestral rela-tionships between pairs of images; nevertheless, the performance ofsuch algorithms is signiﬁcantly degraded if a good set of potentiallyrelated images is not found beforehand. In this vein, we extend uponimage representation and indexing techniques (common in NDD andCBIR areas) to deal with provenance ﬁltering for multiple donor andcomposite images. Our technique comprises two stages: in the ﬁrst,we query an image collection for the most likely donors that mighthave contributed to the creation of the query, if it is a composite.This is done following a traditional CBIR pipeline, which involvesimage representation through appropriate features and the adoptionof a subsequent indexing mechanism (more details in Sec. 2). Thetop retrieved results are then analyzed and compared to the query us-ing scale and rotation-invariant points of interest [21], nearest neigh-bor distance ratio policy [22], and geometric alignment [23]. Afterﬁnding the best possible match to the query, we use that image alongwith the query to calculate a contextual mask to serve as an activationof possible regions that are different between them. Such regions arecandidate regions for possible donors. We then proceed with the sec-ond stage of the search, querying the collection for images that aresimilar to the selected regions of interest in the query as pointed outby the contextual mask. Ultimately, we aggregate the different rank-ings to create a ﬁnal ranked list of images related to the query in Collection IndexingImage

Characterization I ⇤ q q - (…) Image

Characterization I ⇤ Ofﬂine Online r best C Query

Fig. 2 . Method’s pipeline. After retrieving related images, we com-pare the best result with q , incorporate the search’s context and per-form a second search to reﬁne the list of possible donors.terms of possible donors contributing to its creation process and thusclosing the loop for provenance ﬁltering.The contributions of this work are (i) the exploration of differ-ent querying and indexing techniques for the new problem of prove-nance ﬁltering; (ii) the incorporation of provenance context to singleout possible candidate regions related to donors in the creation pro-cesso of a query; and (iii) the study of the efﬁciency and effectivenesstradeoffs involved in the provenance ﬁltering task while dealing withvery large collections of images.

2. PROPOSED METHOD

In this section, we present the proposed approach to provenance ﬁl-tering. Given a query q , such as the image in the center of Fig. 1(b),the objective is to search a collection of images C for all potentialdonors r i contributing to the creation of q , including possible nearduplicates r ij of r i . Near duplicates of q are also of interest as theywould be important for tracing the offspring of q over time.Our approach to this problem involves two stages (c.f., Fig. 2).In the ﬁrst stage, we design a fast image retrieval solution to recoverthe (likely) donor images, with high precision. We then exploit thecontext of the results to ﬁnd the best match r best (respecting geomet-ric constraints) with respect to q and reﬁne the donor list. Regionsthat are different between q and its top-related image r best are of in-terest as they show regions that might have been incorporated into q by combining pieces of different images in C . Leveraging the con-textual mask, the second stage of the search examines C a secondtime, focusing on ﬁnding potential localized donors.In the example of Fig. 1(b), when querying the collection forpotential donors (ﬁrst tier/stage), we would likely retrieve the imagewith the table, ﬂower and their background or the hand (as both aremajor contributors to the composite q ). Calculating the contextualmask gives the region of the hand as a potential donor spliced fromanother source image(s). Therefore, when performing the secondsearch, we look for images similar to that region, which would resultin the donor for the hand as well as the other pieces. This processcan be repeated a number of times if necessary. The different re-trieved lists of results might be combined through rank aggregationtechniques based on the conﬁdence of the retrieved results. The ﬁrst step of our approach needs to represent each image in a ro-bust manner so as to allow us retrieve partially related images in alarge collection. In this context, using bags of words [15] or deepearning techniques [24] would likely fail as they would be goodfor retrieving similar images in general but would not capture pos-sible transformed donors, especially the small or heavily processedones. In addition, a deep learning solution would require large imagecollections spanning different forgeries for a proper training and, inforensics, such collections are simply not available. In face of theselimitations, we opted to represent each image using points of inter-est robust to image transformations, as forgeries often employ suchtransformations for more photorealistic montages. For that, we relyupon Speeded-Up Robust Features (SURF) [21]. We represent animage with about 2000 keypoints for small-scale experiments andwith about 500 keypoints for large-scale ones.

Given a query image q and a collection of images C for searching,we need to represent the images in C in a very compact fashion soas to allow fast querying. For that, we use an indexing algorithmfor ﬁnding nearest neighbors of q , in terms of their representativekeypoints. More speciﬁcally, after extracting the points of interestfor all images in C , we need to ﬁnd the k -nearest points to each key-point in q . We further perform majority voting to infer the similaritybetween the query image q and each image in C based on the nearestkeypoints retrieved from the gallery.As the number of points of interest extracted from C might reachhundreds of millions, the comparison between the q and all imagesin C using brute-force search is impracticable. Therefore, we inves-tigated some algorithms for (cid:15) -approximated nearest neighbors, ade-quate for large-scale searches. According to Arya [25], an approx-imate search can be achieved by considering (1 + (cid:15) )-approximatenearest neighbors for which dist ( k, l ) ≤ (1 + (cid:15) ) dist ( p, l ) suchthat p is the true nearest neighbor for l . Nonetheless, these solu-tions might lose effectiveness depending on the heuristic adopted tospeed up the search. For this reason, here we compare four index-ing approaches in terms of runtime, memory footprint and qualityof the search: KD-Trees and KD-Forests [26], Hierarchical Cluster-ing [27], and Product Quantization [28]. To retrieve the donor images with high recall rates, we propose aquery reﬁnement process, referred to as context incorporation, inthat we use the ranking result obtained in a ﬁrst tier to reformulate thequery so that small objects used to compose the spliced image can bebe retrieved more accurately. First, we need to make sure the queryis well represented in terms of describing keypoints. The overrepre-sentation of the query q aims at guaranteeing we sample basically allof its regions, including the background. Although SURF descrip-tors are robust to describe objects in general in a scene, this approachmost likely will fail in ﬁnding interest points inside very small ob-jects, mainly when such objects are put in a complex background. Toovercome this problem, we perform a query reﬁnement by comput-ing the intersection between q and the best-matching retrieved im-age (most likely the host / background donor). This leads to a newquery image containing just the information about the objects addedin the host image. Our second search stage consists of querying thecollection using the keypoints falling within the selected regions ofinterest. We combine the different ranked lists using the conﬁdenceof the retrieved images (number of votes and keypoints matched). Fig. 3 . Example of a query, its top-related donor and the contextualmask. In the top row, the contextual mask captures the added rocks,person, bird and red-dirty region. In turn, the mask in the secondrow captures the added umbrella, content-smoothed sand on the leftand the deleted white bird.

To ﬁnd the contextual mask, we perform an image registration be-tween q and the top-match image r best in the ranked list obtained inthe ﬁrst tier of search. We match SURF features extracted from bothimages, select the best-matching keypoints and calculate the dis-tance between the two images using the selected pairs of matches.We then calculate the geometrical transformation present in r best with respect to q via image homography. Next, we compute themask that indicates the candidate regions in which we might havespliced objects. We generate this mask by computing the differencebetween geometrically aligned images, followed by an opening op-eration with a × -structuring element and a × -kernel medianﬁlter to reduce the residual noise present in the mask. We also per-form color quantization to 32-bits before computing the differencebetween the two images to reduce the presence of noise in the mask.There are some extreme cases for this approach that are worthdiscussing. First, when the top retrieved image does not have any-thing in common with q , the calculated mask should be null. In thiscase, there should be no search in the second tier. In turn, when q it-self is not a composite, the top retrieved image might be non-relatedat all (case one above) or a near-duplicate of q , in which case themask is virtually identical to q . In the latter case, the search in thesecond tier should result in basically the same images retrieved inthe ﬁrst tier. Fig. 3 depicts examples of a query q , its top result r and the calculated contextual masks.

3. EXPERIMENTS AND RESULTS

In this section, we present and discuss the experimental results weperformed to validate the proposed method. We report the quality ofthe results in terms of Recall@k that measures the fraction of correctimages at the top- k retrieved results. The source code of all proposedmethods are freely available . Datasets.

We adopt the Nimble Challenge 2016 (NC2016) and 2017(NC2017) datasets, provided by the National Institute of Standardsand Technology (NIST) [2], which focus on forensics, provenanceﬁltering and phylogeny tasks. These datasets comprise a query setcontaining different kinds of manipulated images (e.g., copy-moveand compositions), and a gallery set containing the source imagesused to produce the queries. The datasets also comprise distractorimages. The probe sets of NC2016 and NC2017 datasets contain and composite images, respectively. The gallery sets contain and images, respectively. We also embed the datasets The source code is freely available on https://gitlab.com/notredame-provenance/filtering K0.00.20.40.60.81.0 R e c a ll @ K KD-Tree

First tier ranking resultsContext incorporation K0.00.20.40.60.81.0 R e c a ll @ K KD-Forest

First tier ranking resultsContext incorporation K0.00.20.40.60.81.0 R e c a ll @ K PQ First tier ranking resultsContext incorporation K0.00.20.40.60.81.0 R e c a ll @ K HCAL

First tier ranking resultsContext incorporation

Fig. 4 . First- and second-tier results for the NC2017 dataset in termsof Recall@k. The context incorporation is important regardless ofthe used indexing technique.

Table 1 . Runtime (in seconds) and memory usage (GB), per query,in the ﬁrst tier, for different indexing techniques in the NC2017 andNC2017 + World1M datasets. KD-Forest comprises two trees. * de-notes the method did not scale.

Method KD-Tree KD-Forest PQ HCALRuntime . s . s . s . s Memory . GB . GB . GB . GB Runtime (World1M) . s . s ∗ ∗ Memory (World1M) . GB . GB ∗ ∗ within one million images (distractors) provided by RankOne Inc. ,as recommended by NIST for evaluating scalability. Indexing Method.

We now analyze (see Table 1) different indexingapproaches for NC2017 and NC2017+World1M in terms of memoryfootprint and efﬁciency (results for NC2016 are similar) consideringan Intel(R) Xeon(R), CPU E5-2620 v3 @2.40GHz, 24 cores and512GB of RAM. Although PQ is more efﬁcient in terms of storagefor a small scale, it does not scale for World1M. The clustering inHCAL prevented it from scaling for 1M images. More work involv-ing approximate clustering and sampling would be necessary in thiscase. KD-Tree shows a good storage and efﬁciency tradeoff.

Context Incorporation and Ranking Aggregation.

In this section,we evaluate the proposed approach to improve ranking results fordonor images. Fig. 4 shows the performance results in terms of recallat the top- k retrieved images, considering the retrieval of donor im-ages in the ﬁrst and second tiers of the proposed method. Althoughnot shown here, the performance for retrieving the host image is al-ways above 95% as it shares much content with q . The challenge inprovenance ﬁltering is in retrieving the donors. Large-scale Image Retrieval.

We now evaluate the proposed ap-proach, considering a more challenging scenario, in which we em-bed the NC2016 and NC2017 datasets into one million images, here-inafter referred to as World1M dataset. The World1M dataset con-tains several images that are semantically similar to the images thatcompose both datasets. Table 2 shows the obtained results in thisexperiment. There is a gain of about when retrieving donors for http://medifor.rankone.io/ Table 2 . Performance results for NC2016 and NC2017 datasets em-bedded in one million images and KD-Forest (2 trees). Bold high-lights improvements in the second tier.

Dataset Type Tier Recall@10 . NC2016 + World1M

Host 2nd % . NC2016 + World1M

Donor 2nd % . NC2017 + World1M

Host 2nd . . NC2017 + World1M

Donor 2nd . Fig. 5 . Queries and results for KD-Forest + 2 trees. The ﬁrst andthird rows refer to the ﬁrst tier results while the second and fourthrefer to the second tier. The green border denotes the matched hostwhile the blue ones denote donors. The search in the second tierallows the retrieval of donors that were not present in the ﬁrst tier.NC2016 when we compare the obtained results in the ﬁrst and sec-ond tiers. The results for NC2017 are slightly lower given that thecomposite images in this dataset are more difﬁcult, more photoreal-istic and smaller with respect to the whole image, which also impactsthe context incorporation, second tier (ﬁrst- and second-tier resultsremain equal for this case). A future work consists of improving thecontext incorporation mask to better capture small donors such asthose present in NC2017.

Qualitative Analysis.

Fig. 5 shows the results of two queries forKD-Forests with two trees in the ﬁrst and second tiers.

4. CONCLUSIONS

In this paper, we introduced a ﬁrst method for provenance ﬁlteringdesigned to improve retrieval of donor images in composite images.Reliable provenance ﬁltering is highly useful for selecting the mostpromising candidates for more complex analyzes in the multimediaphylogeny pipeline such as graph construction and inference of di-rectionality of donors and descendants. The challenge in this prob-lem is the retrieval of small objects considering a large image gallery.By incorporating the context of the top results with respect tothe query itself, we can improve the retrieval results and better ﬁndpossible donors of a given composite (forged) query q . Experimentswith different indexing techniques have also shown that KD-forestsseem to be the most effective but not the most efﬁcient. KD-trees,on the other hand, are more efﬁcient but less effective. In our exper-iments, PQ did not perform well for large galleries.Future research efforts will focus on better characterizing smallorged regions, incorporating forgery detectors in the process of con-text analysis and also consider bringing the user into the loop withrelevance feedback methods. . REFERENCES [1] Zanoni Dias, Siome Goldenstein, and Anderson Rocha, “To-ward image phylogeny forests: Automatically recovering se-mantically similar image relationships,” Forensic science in-ternational

IEEE Transactions on InformationForensics and Security (TIFS) , vol. 7, no. 2, pp. 774–788, April2012.[4] Ralph Keyes,

The post-truth era: Dishonesty and deception incontemporary life , Macmillan, 2004.[5] Jonathan Mahler, “The problem with self-investigation ina post-truth era,”

The New York Times Magazine , January1st, 2017, Available online at http://tinyurl.com/juufufc .[6] Katherine Schulten and Amanda Christy Brown, “Evaluatingsources in a ‘post-truth’ world: Ideas for teaching and learningabout fake news,”

The New York Times , January 19th, 2017,Available online at http://tinyurl.com/h3w7rp8 .[7] A. Rocha, W. Scheirer, T. E. Boult, and S. Goldenstein, “Visionof the Unseen: Current Trends and Challenges in Digital Imageand Video Forensics,”

ACM Computing Surveys (CSUR) , vol.43, pp. 1–42, 2011.[8] Alberto A de Oliveira, Pasquale Ferrara, Alessia De Rosa,Alessandro Piva, Mauro Barni, Siome Goldenstein, ZanoniDias, and Anderson Rocha, “Multiple parenting phylogenyrelationships in digital images,”

IEEE Transactions on Infor-mation Forensics and Security , vol. 11, no. 2, pp. 328–343,2016.[9] Yan Ke, Rahul Sukthankar, and Larry Huston, “Efﬁcient near-duplicate detection and sub-image retrieval,” in

ACM Intl. Con-ference on Multimedia , 2004, pp. 869–876.[10] Wengang Zhou, Yijuan Lu, Houqiang Li, Yibing Song, andQi Tian, “Spatial coding for large scale partial-duplicate webimage search,” in

ACM Int. Conference on Multimedia , NewYork, NY, USA, 2010, MM ’10, pp. 511–520, ACM.[11] S. Tang, H. Chen, K. Lv, and Y. D. Zhang, “Large visual wordsfor large scale image classiﬁcation,” in

IEEE Int. Conferenceon Image Processing (ICIP) , Sept 2015, pp. 1170–1174.[12] J. Yuan and X. Liu, “Product tree quantization for approximatenearest neighbor search,” in

IEEE Int. Conference on ImageProcessing (ICIP) , Sept 2015, pp. 2035–2039.[13] K. H. Zeng, Y. C. Lin, A. Farhadi, and M. Sun, “Semantic high-light retrieval,” in

IEEE Int. Conference on Image Processing(ICIP) , Sept 2016, pp. 3359–3363.[14] Wei Dong, Zhe Wang, Moses Charikar, and Kai Li, “High-conﬁdence near-duplicate image detection,” in

ACM Int. Con-ference on Multimedia Retrieval , New York, NY, USA, 2012,pp. 1:1–1:8, ACM.[15] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang, “Imageretrieval: Ideas, inﬂuences, and trends of the new age,”

ACMComputing Surveys (CSUR) , vol. 40, no. 2, pp. 5, 2008. [16] Thomas Deselaers, Tobias Gass, Philippe Dreuw, and HermannNey, “Jointly optimising relevance and diversity in image re-trieval,” in

ACM Int. Conference on Multimedia Retrieval .ACM, 2009, p. 39.[17] Zanoni Dias, Anderson Rocha, and Siome Goldenstein, “Videophylogeny: Recovering near-duplicate video relationships,” in

IEEE Int. Workshop on Information Forensics and Security(WIFS) . IEEE, 2011, pp. 1–6.[18] Silvia Lameri, Paolo Bestagini, Ambra Melloni, SimoneMilani, Anderson Rocha, Marco Tagliasacchi, and StefanoTubaro, “Who is my parent? reconstructing video sequencesfrom partially matching shots,” in

IEEE Int. Conference onImage Processing (ICIP) . IEEE, 2014, pp. 5342–5346.[19] Matteo Nucci, Marco Tagliasacchi, and Stefano Tubaro, “Aphylogenetic analysis of near-duplicate audio tracks,” in

IEEEInt. Workshop on Multimedia Signal Processing (MMSP) .IEEE, 2013, pp. 099–104.[20] Nicholas Andrews, Jason Eisner, and Mark Dredze, “Namephylogeny: A generative model of string variation,” in

Intl.Conference on Empirical Methods in Natural Language Pro-cessing and Computational Natural Language Learning . As-sociation for Computational Linguistics, 2012, pp. 344–355.[21] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and LucVan Gool, “Speeded-up robust features (surf),”

Comput. Vis.Image Underst. , vol. 110, no. 3, pp. 346–359, June 2008.[22] David G Lowe, “Object recognition from local scale-invariantfeatures,” in

IEEE Int. Conference on Computer Vision andPattern Recognition (CVPR) . Ieee, 1999, vol. 2, pp. 1150–1157.[23] Barbara Zitova and Jan Flusser, “Image registration methods:a survey,”

Image and vision computing , vol. 21, no. 11, pp.977–1000, 2003.[24] Ian Goodfellow, Yoshua Bengio, and Aaron Courville,

Deeplearning , MIT Press, 2016.[25] Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Sil-verman, and Angela Y. Wu, “An optimal algorithm for approx-imate nearest neighbor searching ﬁxed dimensions,”

Journalof ACM , vol. 45, no. 6, pp. 891–923, Nov. 1998.[26] Jon Louis Bentley, “Multidimensional binary search trees usedfor associative searching,”

Commun. ACM , vol. 18, no. 9, pp.509–517, Sept. 1975.[27] Michael Steinbach, George Karypis, and Vipin Kumar, “Acomparison of document clustering techniques,” in

In KDDWorkshop on Text Mining , 2000.[28] H. Jegou, M. Douze, and C. Schmid, “Product quantizationfor nearest neighbor search,”