A Filter of Minhash for Image Similarity Measures
Jun Long, Qunfeng Liu, Xinpan Yuan, Chengyuan Zhang, Junfeng Liu
aa r X i v : . [ c s . MM ] J u l Noname manuscript No. (will be inserted by the editor)
A Filter of Minhash for Image Similarity Measures
Jun Long † ♮ · Qunfeng Liu † ♮ · Xinpan Yuan ‡ ♮ · Chengyuan Zhang † ♮ · Junfeng Liu † ♮ · Received: date / Accepted: date
Abstract
Image similarity measures play an important role in nearest neighbor searchand duplicate detection for large-scale image datasets. Recently, Minwise Hashing (orMinhash) and its related hashing algorithms have achieved great performances in large-scale image retrieval systems. However, there are a large number of comparisons for imagepairs in these applications, which may spend a lot of computation time and affect theperformance. In order to quickly obtain the pairwise images that theirs similarities arehigher than the specific threshold T (e.g., 0.5), we propose a dynamic threshold filter ofMinwise Hashing for image similarity measures. It greatly reduces the calculation timeby terminating the unnecessary comparisons in advance. We also find that the filter canbe extended to other hashing algorithms, on when the estimator satisfies the binomialdistribution, such as b-Bit Minwise Hashing, One Permutation Hashing, etc. In this pager,we use the Bag-of-Visual-Words (BoVW) model based on the Scale Invariant FeatureTransform (SIFT) to represent the image features. We have proved that the filter iscorrect and effective through the experiment on real image datasets.
Keywords
Image similarity measures, BoVW, SIFT, Minwsie Hashing, Dynamicthreshold filter
Jun LongE-mail: [email protected] LiuE-mail: [email protected] (cid:0)
XinPan YuanE-mail: [email protected] ZhangE-mail: [email protected] LiuE-mail: [email protected] † School of Information Science and Engeering, Central South University, PR China ♮ Big Data and Knowledge Engineering Institute, Central South University, PR China ‡ School of Computer, Hunan University of Technology, China Jun Long † ♮ et al. In recent years, with the rapid development of the Mobile Internet and social multimedia,a large number of images and videos are generated in the Internet every day. In theMobile Internet era, people can take all kinds of pictures at any time or in any placeand share them with their friends on the Internet, which results in the explosive growthof digital pictures. At present, hundreds and millions of pictures are uploaded to socialmedia platforms every day, such as Facebook, Twitter, Flickr and so on. How to quicklysearch the similar images becomes a hot topic for the multimedia researchers, and therelated research areas are also concerned [8].Image similarity measures aim to estimate whether a given pair of images is similaror not. It plays an important role in nearest neighbor search and near-duplicate detectionfor large-scale image resources. Recently, the Bag-of-Visual-Words (BoVW) model [39,40], with local features, such as SIFT [12], has been proven to be the most successful andpopular local image descriptors. In the BoVW model, a bag of visual words is used torepresent each image. The visual words are usually generated by clustering the extractedSIFT features. The SIFT descriptor is widely used in image matching [9,18,19] and imagesearch [41,11,22].At the beginning, the Minwise Hashing was mainly designed for measuring the set sim-ilarity. The algorithm is widely used for near-duplicate web page detection and clustering[2,5,22], set similarity measures [1], nearest neighbor search [6], large-scale learning[10,23], etc. In recent years, Minwise Hashing has been applied to the computer visionapplications. Weighted min-Hash method has been proposed to find the near duplicatedimages. Grauman [7] combined the distance metric learning with the min-Hash algorithmto improve the image retrieval performance. A new method of highly efficient min-Hashgeneration for image collections is proposed by Chum and Matas [3]. Zhao developedan efficient matching technique for linking large image collections namely Sim-Min-Hash[38].Qu [15] proposed a spatial min-Hash algorithm for similar image searching. In ad-dition, some multimedia researchers proposed learning hash function for image similaritysearch [21,29,31,24]. All these methods have achieved great performances in image simi-larity measures and image searching. However, in many image retrieval or search systems,there are huge amounts of comparisons for image pairs, which may spend a lot of com-putation time and have a negative impact on the performance of the systems.Inspired by the successes of Minwise Hashing in image similarity measures, our maincontributions are as follows: – We propose a dynamic threshold filter of Minwise Hashing for image similarity mea-sures. The filter divides the whole fingerprint comparison process into a series of com-parison points and sets the corresponding thresholds. At the k-th comparison point,the method will filter out the pairwise images whose similarities are less than thelower threshold TL(k) in advance. Meanwhile, the algorithm will output the pairwiseimages when theirs similarities are higher than the upper threshold TU(k). It greatlyreduces the calculation time by terminating the unnecessary comparisons in advance. – We find that the filter can be extended to other hashing algorithms for image similaritymeasures, as long as the estimator satisfies the binomial distribution, such as b-BitMinwise Hashing, One Permutation Hashing, etc.
Filter of Minhash for Image Similarity Measures 3
Table 1: Notations
R resemblanceS set J ( S , S ) the original Jaccard similarity of S and S K the k-th comparison point π a random permutation function Ω represent the whole elements in the process of random permuta-tion π ( S ) the hash values of the set S when given a hash function π ( . ) min ( π ( S )) the minimum hash value in π ( S )Pr probability R M the estimator of Minwise Hashing R M ( k ) the estimated similarity of Minwise Hashing at the k -th compar-ison pointT the number of times that the fingerprints are equal T a specified threshold T L ( k ) the lower threshold at the k -th comparison point T U ( k ) the upper threshold at the k -th comparison pointX the number of times that the fingerprints are equalF(x) the distribution function of Xe a small probabilitym a solution of the equationH the hypothesis testing Roadmap.
The rest of the paper is organized as follows: Section 2 discusses the relatedworks. Section 3 describes the dynamic threshold filter in detail. In Section 4, the filteris experimentally verified on real image databases. Section 5 gives conclusions.1.1 Image Representation and Similarity MeasuresThis section reviews the BoVW model based on the SIFT to represent the image features,as well as the Minwise Hashing for measuring the set similarity. In this paper, the relevantnotations are shown in Table 1.
Scale Invariant Feature Transform (SIFT) [12] is a computer vision algorithm, whichdetects and describes local features in images. The SIFT feature descriptor is based onthe appearance of the object at particular points. Besides, the descriptor is invariantto uniform scaling, orientation, illumination changes, and partially invariant to affinedistortion. In the Bag-of-Visual-Words model, a set of visual words V [16,17,37,25,26]is constructed by the SIFT descriptor. The method builds the vocabulary through theK-means clustering algorithm from the training image datasets [13,14,27,35,36,34,28].For each image, the SIFT features are assigned to the nearest cluster center and give thevisual word representation.For a vocabulary V, each visual word is encoded with the unique identifier from 1,, | V | , where the variable | V | is defines as the size of the vocabulary V. A set S i of words S i ⊆ V is a local representation, which does not store the number of features but onlyfocusing on whether they present or not.
Jun Long † ♮ et al. Therefore, each image can be represented by a visual word set S. The similarity of thepairwise images is equivalent to measure the similarity of visual sets [4,20]. We assumethat S and S are the visual sets from a pair of images. So the similarity between twoimages can be defined as the Jaccard coefficient: R = sim ( S , S ) = S ∩ S S ∪ S = Jaccard ( S , S ) (1) Minwise Hashing (or Minhash) is a Locality Sensitive Hashing, and is considered to bethe most popular similarity estimation methods. It keeps a sketch of the data and pro-vides an unbiased estimate of pairwise Jaccard similarity. In 1997, Andrei Broder andhis colleagues invented the Minwise Hashing algorithm for near-duplicate web page de-tection and clustering [2]. Recently, the algorithm is widely used in many applicationsincluding duplicate detection [5], all-pairs similarity [1], nearest neighbor search [6],large-scale learning [10] and computer visions [30,32,33]. According to the process ofMinwise Hashing, the algorithm requires K (commonly, K=1000) independent randompermutations to deal with the datasets.It denotes as a random permutation function: π : Ω ← Ω and min ( π ( S ))= min i ∈ S ( π ( i )) . The similarity between two non-empty sets S and S is defined as: P r ( min ( ψ ( S )) = min ( ψ ( S ))) = S ∩ S S ∪ S = Jaccard ( S , S ) (2)It generates K random permutations π , π , π , , π K independently, and the estimatorof Minwise Hashing is: R M ( S , S ) = 1 K K X i =0 { min ( π i ( S i ) = min ( π i ( S ))) } (3) R M is an unbiased estimator of J ( S , S ), with the variance: V ar ( R M ) = 1 K J ( S , S )(1 − J ( S , S )) (4)1.2 Dynamic Threshold Filter In order to estimate the similarity of sets, the hashing algorithms generate K fingerprints(or hash values) by K random permutations, and obtain the unbiased similarity R afterthe fingerprint comparisons. For example, there are 1 million set pairs and K=1000, weneed 1 billion comparisons. These large-scale comparisons spend a lot of computationtime and storage space.According to the clustering algorithm for web pages [10], we also cluster by the vi-sual words in large-scale image datasets and generate a series of image pairs, named(
Image , Image ) , ( Image , Image ) , . . . , ( Image n − , Image n ). The corresponding setpairs are defined as ( S , S ) , ( S , S ) , ..., ( S n − , S n ). We set a specified threshold T (e.g., Filter of Minhash for Image Similarity Measures 5
Fig. 1:
Filter Example { ( S i − , S i ) | R ( S i − , S i ) ≥ T, ≤ i ≤ n } , as shownin Fig. 1.However, we only care about the pairwise datasets whose similarities are larger thana specified threshold T (e.g., 0.5) in some multimedia applications, e.g. near-duplicatedetection, clustering, nearest neighbor search, etc.The strategy of the filter is: given a small probability e and a specified threshold T atthe k-th (0 < k ≤ K ) comparison, the set pairs whose similarities are smaller than thethreshold T L ( k ) will be filtered out. Similarly, the set pairs whose similarities are higherthan the threshold T U ( k )) will be output.According to the above viewpoints, we could set a filter to output or filter out someset pairs in advance, during the process of comparisons. The strategy of the filter isshown in Fig. 2. The pre-filtering method greatly reduces the number of comparisons andcomputation time. When the estimator satisfies the binomial distribution, we can build a dynamic thresholdfilter through the hypothesis testing and the small probability event. It greatly reduces thecalculation time by terminating the unnecessary comparisons in advance. In this pager,we take Minwise Hashing algorithm as an example and build the filter for image similaritymeasures.As we know, each image can be represented by a set S i of visual words throughthe BoVW model. Then, we could use Minwise Hashing algorithm to deal with the setpair ( S i − , S i ) and obtain the corresponding fingerprints. The relevant definitions arefollowing:The random variable X is the number of times that the fingerprints are equal, that is: X = K X i =0 { min ( π j ( S i − ) = min ( π j ( S i ))) } (5) Jun Long † ♮ et al. Fig. 2:
Filter Strategy
Obviously, the random variable X satisfies the binomial distribution, denotes as X ∼ B ( n, R M ). Thus, The distribution function F ( x ) of the variable X is denoted as follow: F ( x ) = (P mi =0 (cid:0) ni (cid:1) R M i (1 − R M ) n − i , x ≤ m P Ki = m +1 (cid:0) ni (cid:1) R M i (1 − R M ) n − i , x > m (6)where the variable m is in the interval (0, k]. R M is the estimator of Minwise Hashing after all the K comparisons, there is: R ≈ R M = XK (7)The variable R M ( k ) is defined as the estimated similarity of the Minwise Hashing ,atthe k-th (0 < k ≤ K ) comparison: R M ( k ) = Xk (8) The Lower ThresholdLemma 1
Given a threshold T and a small probability e at the k-th ( < k ≤ K ) com-parison, we can obtain the solution m = m l from the following equation m X i =0 (cid:0) ki (cid:1) T i (1 − T ) k − i = e, < k ≤ K Then, we could set the lower threshold T L ( k ) = m l k . It’s obvious that R M < T , when R M ( k ) T L ( k ) . Filter of Minhash for Image Similarity Measures 7
Table 2: The values of F(x) for different m m F(x) m F(x)10 1.53-17 60 0.98220 5.6-10 70 0.99930 3.9-5 80 0.99940 0.028 90 0.99950 0.539 100 1
We use the hypothesis testing to prove the Lemma 1.
Proof
Assume H : R M ≥ T , H : R M < T , and the random variable X satisfies the binomialdistribution: X ∼ B ( n, R M ), at the k-th (0 < k ≤ K )comparison point. The probabilityof the event X ≤ m is: P r ( X ≤ m ) = m X i =0 (cid:0) ni (cid:1) R M i (1 − R M ) n − i ≤ e where m satisfies 0 < m ≤ k .Obviously, the event X ≤ m belongs to a small probability event. When R M ( k ) ≤ T L ( k ),we know that: Xk ≤ m l k Then, X ≤ m l ≤ m In other words, when R M ( k ) ≤ T L ( k ), the small probability event X ≤ m occurs in anexperiment. Therefore, we should reject the hypothesis H and accept the hypothesis H .According to the above discussion, the estimated similarity R M is less than the thresholdT and the lemma is proved.The following is an example of Lemma 1:Given a set pair ( S i − , S i ), when K=1000, T=0.5, k=100, we know the randomvariable X satisfies the binomial distribution, that is, X ∼ B ( n, T ). The distributionfunction F(x) of X for different m are shown in Table 2.When x=20, it is obvious that Pr(X ≤ ≤ ≤
20 is a smallprobability event, according to the Table 1.Assume H : R M ≥ T = 0 . H : R M < T = 0 .
5, We select a small probability e=5.6-10, m= m l =20 and obtain the lower threshold T L (100)=20/100=0.2. At the 100-th comparison, there is R M ( k ) = X k ≤ T L ( k )=0.2. The probability of the event X ≤
20 isonly 5.6-10. Clearly, the event X ≤
20 is a small probability event. However, it occurs inan experiment. According to the above discussion, we must reject the hypothesis H : R M ≥ T =0.5and accept the hypothesis H : R M < T =0.5. It means the similarity of thepair ( S i − , S i ) is less than threshold T=0.5.Fig. 3 explains the original comparison process, the set pairs will be output whentheir similarities R M are higher than the threshold T. In our method, we can add thelower threshold T L (100), at the 100-th comparison point, as what is shown in Fig. 4. If R M ( k = 100) T L (100), we easily obtain the R M ( k = 1000) < T and there is no need Jun Long † ♮ et al. Fig. 3:
Original comparison process
Fig. 4:
Our comparison process to the remaining comparisons. However, when R M ( k = 100) > T L (100), we require theremaining 900 comparisons and calculate the R M (k=1000).To sum up, during the Minwise Hashing comparison process, we set the small proba-bility e, the threshold T as well as the estimated similarity R M (k) at the k-th (0 < k ≤ K )observation point. If R M ( k ) < T L ( k ), we can predict R M < T . Therefore, there is no needto the rest of comparisons. Compared the Fig. 3 with the Fig. 4, the method effectivelysaves the computing time. The Upper Threshold
Meanwhile, there must be an upper threshold: T U . Filter of Minhash for Image Similarity Measures 9
Lemma 2
Given a threshold T and a small probability e, according to the equation: k X i = m +1 (cid:0) ki (cid:1) T i (1 − T ) k − i = e, < k ≤ K We could obtain m= m u and the upper threshold T U ( k ) = m u k . It is clear that R M > T ,when R M ( k ) ≥ T U ( k ) .Proof Assume H : R M < T , H : R M ≥ T , and the random variable X satisfies the binomialdistribution: X ∼ B ( n, R M ), at the k-th (0 < k ≤ K )comparison. The probability of theevent X > m is: P r ( X > m ) = k X i = m +1 (cid:0) ki (cid:1) R M i (1 − R M ) k − i ≤ e where m satisfies 0 < m ≤ k .Obviously, the event X¿m is a small probability event. When R M ( k ) > T U ( k ), weknow that: Xk > m u k Then,
X > m u > m That is to say, when R M ( k ) > T U ( k ), the small probability event X¿m occurs in anexperiment, we should refuse the hypothesis H and accept the hypothesis H . Therefore,the Lemma 2 is proved.The following is an example of Lemma 2.Given a set pair ( S i − , S i ), when K=1000, T=0.5 and k=100, and the random vari-able X, X ∼ B ( n, T ). The values of 1-F(x) for different m are shown in Table 3.According to the Table 3, when x=80, it is obvious that P r ( X > < . −
10 andthe event
X >
80 satisfies a small probability event. We assume that: H : R M < T = 0 . H : R M ≥ T = 0 .
5. Identically, we select e =1.35-10, m= m u =80 and obtain the upperthreshold T U (100)= 80100=0.8. At the 100-th comparison, there is R M (k)= Xk ¿ T U (k)=0.8.The probability of the event X >
80 is only 1.35-10. Clearly, the event
X >
80 is a smallprobability event. However, it happens in an experiment. According to the above dis-cussion, we have to reject the hypothesis H : R M < T =0.5 and accept the hypothesis H : R M ≥ T =0.5. That is, the estimated similarity R M of the pair ( S i − , S i ) satisfies R M ≥ T =0.5.In short, during the Minwise Hashing comparison process, we can set the small prob-ability e, the similarity threshold T as well as the estimated similarity R M (k) at thek-th (0 < k ≤ K ) comparison point. It exists the upper threshold T U ( k ) = muk . If R M ( k ) > T U ( k ), we can predict R M > T . There is no need to the following K-k compar-isons. As shown in Fig. 5, it effectively saves the computing time in image similarity mea-sures. When k=100, we can add the upper threshold T U (100). If R M ( k = 100) > T U (100),we can easily draw that R M ( k = 1000) > T , and there is no need for the following 900comparisons. If R M ( k = 100) T U (100), we require the remaining 900 comparisons andcontinue to calculate the R M (k=1000). † ♮ et al. Fig. 5:
Upper Threshold Filter
Combining the above discussion, we can set the upper and the lower threshold simulta-neously at the k-th(0 < k ≤ K ) comparison point. The threshold filter can eliminate oroutput the predictable set pairs ahead of time.Obviously, we can build a series of comparison points k , k , · · · , k i , · · · , k n − , k n , (0 Input: All the set pairs ( S , S ) , ( S , S ) , · · · , ( S n − , S n ); A specified threshold T; A small probability e;A series of comparison points k , k , · · · , k i , · · · , k n − , k n , (0 < ki ≤ K, < i ≤ n ) Output: The set pairs whose estimated similarities are greater than the threshold T:( S i − , S i ) | R ( S i − , S i ) ≥ T, ≤ in for for each set pair ( S i − , S i ) do for k= 1: K do 3: R=R(k);4: i=1;5: if k= k i then T L ( k ) = m l k ; T U ( k ) = m u k ;7: if R ≥ T U ( k ) then 8: Output the set pair ( S i − , S i ); Break;9: end if if R ≤ T L ( k ) then 11: Filter out the set pair ( S i − , S i ); Break;12: end if 13: i++;14: end if end for end for Filter of Minhash for Image Similarity Measures 11 Algorithm 1. The inputs of the algorithm include all the set pairs ( S , S ) , ( S , S ) , · · · , ( S n − , S n ), as well as the parameters that we set ahead of time: a specified thresholdT, a small probability e and a series of comparison points k , k , · · · , k i , · · · , k n − , k n ,(0 < k i ≤ K, < i ≤ n ). The outputs are the set pairs whose estimated similarities aregreater than the threshold T after comparisons: ( S i − , S i ) | R ( S i − , S i ) ≥ T, ≤ i ≤ n . Line 3 shows that the algorithm calculates the estimated similarity R=R( k i ) throughthe top k fingerprints. Line 5-15 describe that the algorithm calculates the lower threshold T L ( k i ) and the upper threshold T U ( k i ) through the equation (9), (12) at the comparisonpoint k= k i . If R ≥ T U ( k i ), we can judge that: R ≥ T , and the set pair ( S i − , S i )can be output ahead of time. Besides, if R¡ T L ( k i ), we could obtain: R¡T, and the set pair( S i − , S i ) can be filtered out in advance. Otherwise, the algorithm should enter the nextpoint k i +1 and continue to compare the remaining fingerprints.Fig. 6. gives an example of the entire comparison process, we can set k i =100, 200, ..., and define the lower threshold value T L ( k i ) and the upper threshold T U ( k i ) for eachcomparison point k i . Fig. 6: Lower Threshold Filter † ♮ et al. (1) We compare the time of comparison between the original Minwise Hash and thedynamic threshold filter. As shown in Fig. 7, the Minwise Hashing with a filter greatlyreduces the calculation time. When comparing 104 set pairs, we can easily find out thatthe time of the comparison is inversely proportional to the value of the small probability.The comparison time of the original Minwise Hashing is 30*103 ms. After using a filterwith a small probability e=10-3, the calculation time is the least, only 9.3*103 ms, andit is 31% of the original Minwise Hashing.Fig. 7: Vary Number of Set Pairs (2) We select three copies of 4000 set pairs and half of their similarities (measured bythe Jaccard coefficient) are about 80%, 50%, and 30%, respectively. As shown in Fig. 8,the filter is more effective for the set pairs whose similarities are very low or high. If thesimilarity of the set pairs is mostly high or low, the comparison time will be little.Fig. 8: Vary Similarity Threshold Filter of Minhash for Image Similarity Measures 13 The Filtering rate refers to the probability that the set pairs are excluded or output inadvance, at the k-th comparison point. Therefore, we define the filtering rate (FR) atcomparison point k as: F R ( T, k, P r ) = R M ( k ) < T L ( k ) N um , where the variable Num represents the total number of set pairs and Num=3105.Fig. 9: Vary Comparison Point, When T=0.3 Fig. 10: Vary Comparison Point, When T=0.5 Obviously, the filtering rate has a great relationship with the input data. When thesimilarities of the set pairs are mostly low, the filtering rate will be high. According to theabove equation, we mainly measure the filtering rate (FR) at different small probabilitye=10-10, 10-5, 10-3, as shown in Fig. 9, Fig. 10, Fig. 11. As the small probability increases,the filtering rate is also increasing. It means that the more set pairs are excluded or † ♮ et al. Fig. 11: Vary Comparison Point, When T=0.8 output in advance, and the less the calculation time is. For example, FR(0.3,100,10-10)=10%, FR(0.3,100,10-5)=60%, FR(0.3,100,10-3)=72% when k = 100, T=0.3. Amongthem, FR(0.3, 100, 10-3) =72% means that 73% of the set pairs save the remaining 900comparisons. We select three groups of data with actual similarity about 80%, 50% and 30%. Eachgroup include 4 ∗ pairs of sets. In Fig. 12, we mainly analyze the accuracy of thefilter. We found that the accuracy of the filter is extremely close to 1.0 and the error canbe negligible. That is to say, the image similarity estimated by the filter is almost thesame as the original minhash. The main reason may be that the small probability eventsseldom happen in similarity measurement experiments. In addition, we found that thesmaller the probability e is, the higher the accuracy is.Fig. 12: Vary Small Probability Filter of Minhash for Image Similarity Measures 15 In this paper, we use the Bag-of-Words model and a 128-dimensional SIFT descriptorfor image feature representation. The method has achieved great success in computervision applications, such as image matching, near-duplicate detection and image search.Inspired by the successes of Minwise Hashing in computer vision, we combine binomialdistribution with small probability event and propose a dynamic threshold filter for large-scale image similarity measures. It greatly reduces the calculation time by terminatingthe unnecessary comparison in advance. Besides, we find that the filter can be extended toother hashing algorithms for image similarity measures, such as b-Bit Minwise Hashing,One Permutation Hashing, etc. Our experimental results are based on the image databaseCaltech256, which proves that the filter is effective and correct. Acknowledgments: This work was supported in part by the National Natural Sci-ence Foundation of China (61379110, 61472450, 61702560), the Key Research Program ofHunan Province(2016JC2018), project (2016JC2011, 2018JJ3691) of Science and Technol-ogy Plan of Hunan Province, and Fundamental Research Funds for Central Universitiesof Central South University (2018zzts588). References 1. Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Proceedings of the16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12,2007, pp. 131–140 (2007)2. Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. ComputerNetworks (8-13), 1157–1166 (1997)3. Chum, O., Matas, J.: Fast computation of min-hash signatures for image collections. In: 2012 IEEEConference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012,pp. 3077–3084 (2012)4. Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting.In: Proceedings of the British Machine Vision Conference 2008, Leeds, UK, September 2008, pp. 1–10(2008)5. Henzinger, M.R.: Finding near-duplicate web pages: a large-scale evaluation of algorithms. In: SI-GIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006, pp. 284–291(2006)6. Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensional-ity. In: Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas,Texas, USA, May 23-26, 1998, pp. 604–613 (1998)7. Jain, P., Kulis, B., Grauman, K.: Fast image search for learned metrics. In: 2008 IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008,Anchorage, Alaska, USA (2008)8. Jegou, H., Douze, M., Schmid, C., P´erez, P.: Aggregating local descriptors into a compact image rep-resentation. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pp. 3304–3311 (2010)9. Karami, E., Prasad, S., Shehata, M.S.: Image matching using sift, surf, BRIEF and ORB: performancecomparison for distorted images. CoRR abs/1710.02726 (2017)10. Li, P., Shrivastava, A., Moore, J.L., K¨onig, A.C.: Hashing algorithms for large-scale learning. In: Ad-vances in Neural Information Processing Systems 24: 25th Annual Conference on Neural InformationProcessing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain., pp.2672–2680 (2011)11. Liu, Z., Li, H., Zhang, L., Zhou, W., Tian, Q.: Cross-indexing of binary SIFT codes for large-scaleimage search. IEEE Trans. Image Processing (5), 2047–2057 (2014)12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Com-puter Vision (2), 91–110 (2004)6 Jun Long † ♮ et al.13. Nist´er, D., Stew´enius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition (CVPR 2006), 17-22 June 2006,New York, NY, USA, pp. 2161–2168 (2006)14. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabulariesand fast spatial matching. In: 2007 IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA (2007)15. Qu, Y., Song, S., Yang, J., Li, J.: Spatial min-hash for similar image search. In: InternationalConference on Internet Multimedia Computing and Service, ICIMCS ’13, Huangshan, China - August17 - 19, 2013, pp. 287–290 (2013)16. Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: 9thIEEE International Conference on Computer Vision (ICCV 2003), 14-17 October 2003, Nice, France,pp. 1470–1477 (2003)17. Wang, Y., Lin, X., Wu, L., Zhang, Q., Zhang, W.: Shifting multi-hypergraphs via collaborativeprobabilistic voting. Knowledge and Information Systems , 515–536 (2016)18. Wang, Y., Lin, X., Wu, L., Zhang, W.: Effective multi-query expansions: Robust landmark retrieval.In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM ’15, Brisbane,Australia, October 26 - 30, 2015, pp. 79–88 (2015)19. Wang, Y., Lin, X., Wu, L., Zhang, W.: Effective multi-query expansions: Collaborative deep networksfor robust landmark retrieval. IEEE Trans. Image Processing (3), 1393–1404 (2017)20. Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q.: Exploiting correlation consensus: Towards subspaceclustering for multi-modal data. In: Proceedings of the ACM International Conference on Multimedia,MM ’14, Orlando, FL, USA, November 03 - 07, 2014, pp. 981–984 (2014)21. Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q.: LBMCH: learning bridging mapping for cross-modal hashing. In: Proceedings of the 38th International ACM SIGIR Conference on Research andDevelopment in Information Retrieval, Santiago, Chile, August 9-13, 2015, pp. 999–1002 (2015)22. Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q., Huang, X.: Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Processing (11), 3939–3949(2015)23. Wang, Y., Lin, X., Zhang, Q.: Towards metric fusion on multi-view data: a cross-view based graphrandom walk approach. In: 22nd ACM International Conference on Information and KnowledgeManagement, CIKM’13, San Francisco, CA, USA, October 27 - November 1, 2013, pp. 805–810(2013)24. Wang, Y., Lin, X., Zhang, Q., Wu, L.: Shifting hypergraphs by probabilistic voting. In: Advancesin Knowledge Discovery and Data Mining - 18th Pacific-Asia Conference, PAKDD 2014, Tainan,Taiwan, May 13-16, 2014. Proceedings, Part II, pp. 234–246 (2014)25. Wang, Y., Wu, L.: Beyond low-rank representations: Orthogonal clustering basis reconstruction withoptimized graph structure for multi-view spectral clustering. Neural Networks , 1–8 (2018)26. Wang, Y., Wu, L., Lin, X., Gao, J.: Multiview spectral clustering via structured low-rank matrixfactorization. IEEE Trans. Neural Networks and Learning Systems (2018)27. Wang, Y., Zhang, W., Wu, L., Lin, X., Fang, M., Pan, S.: Iterative views agreement: An iterativelow-rank based structured optimization method to multi-view spectral clustering. In: Proceedings ofthe Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York,NY, USA, 9-15 July 2016, pp. 2153–2159 (2016)28. Wang, Y., Zhang, W., Wu, L., Lin, X., Zhao, X.: Unsupervised metric fusion over multiview databy graph random walk-based cross-view diffusion. IEEE Trans. Neural Netw. Learning Syst. (1),57–70 (2017)29. Wu, L., Wang, Y.: Robust hashing for multi-view data: Jointly learning low-rank kernelized similarityconsensus and hash functions. Image Vision Comput. , 58–66 (2017)30. Wu, L., Wang, Y., Gao, J., Li, X.: Deep adaptive feature embedding with local sample distributionsfor person re-identification. Pattern Recognition , 275–288 (2018)31. Wu, L., Wang, Y., Ge, Z., Hu, Q., Li, X.: Structured deep hashing with convolutional neural networksfor fast person re-identification. Computer Vision and Image Understanding , 63–73 (2018)32. Wu, L., Wang, Y., Li, X., Gao, J.: Deep attention-based spatially recursive networks for fine-grainedvisual recognition. IEEE Trans. Cybernetics (2018)33. Wu, L., Wang, Y., Li, X., Gao, J.: What-and-where to match: Deep spatially multiplicative integrationnetworks for person re-identification. Pattern Recognition , 727–738 (2018)34. Wu, L., Wang, Y., Shao, L.: Cycle-consistent deep generative hashing for cross-modal retrieval. In:arXiv:1804.11013 (2018)35. Wu, L., Wang, Y., Shepherd, J.: Co-ranking images and tags via random walks on a heterogeneousgraph. In: International Conference on Multimedia Modeling, pp. 228–238 (2013) Filter of Minhash for Image Similarity Measures 1736. Wu, L., Wang, Y., Shepherd, J.: Efficient image and tag co-ranking: a bregman divergence optimiza-tion method. In: ACM Multimedia (2013)37. Wu, L., Wang, Y., Shepherd, J., Zhao, X.: Max-sum diversification on image ranking with non-uniform matroid constraints. Neurocomputing , 10–20 (2013)38. Zhao, W., J´egou, H., Gravier, G.: Sim-min-hash: an efficient matching technique for linking largeimage collections. In: ACM Multimedia Conference, MM ’13, Barcelona, Spain, October 21-25, 2013,pp. 577–580 (2013)39. Zheng, L., Wang, S., Liu, Z., Tian, Q.: Fast image retrieval: Query pruning and early termination.IEEE Trans. Multimedia (5), 648–659 (2015)40. Zheng, L., Wang, S., Zhou, W., Tian, Q.: Bayes merging of multiple vocabularies for scalable imageretrieval. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014,Columbus, OH, USA, June 23-28, 2014, pp. 1963–1970 (2014)41. Zhou, W., Li, H., Lu, Y., Tian, Q.: SIFT match verification by geometric coding for large-scalepartial-duplicate web image search. TOMCCAP9