[PDF] Fusion Hashing: A General Framework for Self-improvement of Hashing

Abstract

Hashing has been widely used for efficient similarity search based on its query and storage efficiency. To obtain better precision, most studies focus on designing different objective functions with different constraints or penalty terms that consider neighborhood information. In this paper, in contrast to existing hashing methods, we propose a novel generalized framework called fusion hashing (FH) to improve the precision of existing hashing methods without adding new constraints or penalty terms. In the proposed FH, given an existing hashing method, we first execute it several times to get several different hash codes for a set of training samples. We then propose two novel fusion strategies that combine these different hash codes into one set of final hash codes. Based on the final hash codes, we learn a simple linear hash function for the samples that can significantly improve model precision. In general, the proposed FH can be adopted in existing hashing method and achieve more precise and stable performance compared to the original hashing method with little extra expenditure in terms of time and space. Extensive experiments were performed based on three benchmark datasets and the results demonstrate the superior performance of the proposed framework

Full PDF

XXXXXXX, VOL. ×× , NO. ×× , ×× × × ×× Fusion Hashing: A General Framework forSelf-improvement of Hashing

Xingbo Liu, Xiushan Nie,

Member, IEEE , Yilong Yin

Abstract —Hashing has been widely used for efﬁcient similaritysearch based on its query and storage efﬁciency. To obtain betterprecision, most studies focus on designing different objectivefunctions with different constraints or penalty terms that considerneighborhood information. In this paper, in contrast to existinghashing methods, we propose a novel generalized frameworkcalled fusion hashing (FH) to improve the precision of existinghashing methods without adding new constraints or penaltyterms. In the proposed FH, given an existing hashing method,we ﬁrst execute it several times to get several different hashcodes for a set of training samples. We then propose two novelfusion strategies that combine these different hash codes into oneset of ﬁnal hash codes. Based on the ﬁnal hash codes, we learn asimple linear hash function for the samples that can signiﬁcantlyimprove model precision. In general, the proposed FH can beadopted in existing hashing method and achieve more preciseand stable performance compared to the original hashing methodwith little extra expenditure in terms of time and space. Extensiveexperiments were performed based on three benchmark datasetsand the results demonstrate the superior performance of theproposed framework.

Index Terms —Hashing, Approximate nearest neighbor search,Fusion hashing, Self-improvement

I. I

NTRODUCTION

The amount of big data has grown explosively in recent yearsand the approximate nearest neighbor (ANN) search, whichtakes a query point and ﬁnds its ANNs within a large database,has been shown to be useful for many practical applications,such as computer vision, information retrieval, data mining, andmachine learning. Hashing is a primary technique in ANN andhas become one of the most popular candidates for performingANN searches because it outperforms many other methods inmost real applications [1] [2].Hashing attempts to convert documents, images, videos, andother types of data into a set of short binary codes that preservethe similarity relationships in the original data. By utilizingthese binary codes, ANN searches can be performed more easilyon large-scale datasets because of the high efﬁciency of pairwisecomparisons based on Hamming distance [3]. Learning-basedhashing is one of the most accurate hashing methods becauseit can achieve better retrieval performance by analyzing theunderlying characteristics of data. Therefore, learning-basedhashing has become popular because the learned compact

X. Liu is with School of Computer Science and Technology, ShandongUniversity, Jinan, P.R. China; X. Nie is with School of Computer Science andTechnology, Shandong University of Finance and Economics, Jinan, P.R. China;Y. Yin is with School of Software, Shandong University, Jinan, P.R. China;(e-mail: [email protected]; [email protected]; [email protected]).(Corresponding author: Xiushan Nie and Yilong Yin.) hash codes can index and organize massive amounts of dataeffectively and efﬁciently.Learning-based hashing is the task of learning a (compound)hash function b = h ( x ) that maps an input item x to acompact code b . The hash function can have a form based ona linear projection, kernel, spherical function, neural network,nonparametric function, etc. Hash functions are an importantfactor inﬂuencing search accuracy when utilizing hash codes.The time cost of computing hash codes is also important. Alinear function can be efﬁciently evaluated, but kernel functionsand nearest-vector-assignment-based functions provide bettersearch accuracy because they are more ﬂexible. Nearly allmethods utilizing a linear hash function can be extended tokernelized hash functions. The most commonly used hashfunctions take the form of a generalized linear projection: b = h ( x ) = sgn ( f ( w T x + t )) , (1)where sgn ( z ) = 1 if z > and sgn ( z ) = 0 (or equivalently − ). Otherwise, w is the projection vector and t is the biasvariable. Here, f ( • ) is a pre-speciﬁed general linear function.Different choices of f ( • ) yield different properties for hashfunctions, leading to a wide range of hashing approaches. Forexample, locality sensitive hashing (LSH) keeps f ( • ) as anidentity function, whereas shift-invariant kernel-based hashingand spectral hashing set f ( • ) to be a shifted cosine or sinusoidalfunction [4] [5].Various algorithms have been developed and exploitedto optimize hash function parameters. Randomized hashingapproaches [6] [7] often utilize random projections or per-mutations. Learning-based hashing frameworks exploits datadistributions and various levels of supervised information to de-termine the optimal parameters for hash functions. Supervisedinformation includes pointwise labels, pairwise relationships,and ranking orders [8] [9] [10] [11] [12].In general, most existing hashing methods attempt to design aloss function (objective function) that can preserve the similarityorder in the target data (i.e., minimize the gap between the ANNsearch results computed from the hash codes and true searchresults obtained from the input data by adding constraints orpenalty terms).In contrast to exiting hashing methods, in this study,we explored a novel strategy that can facilitate the self-improvement of existing hashing methods without adding orchanging any terms in their objective functions. The proposedstrategy is a two-step method. We ﬁrst learn several differenthash codes by utilizing a given hashing method, then fusethe codes according to various rules. Finally, a simple linearhash function is learned for out-of-sample extension. We call a r X i v : . [ c s . D S ] O c t XXXXX, VOL. ×× , NO. ×× , ×× × × ×× this novel framework fusion hashing (FH). FH can be utilizedto provide self-improvement to existing hashing method. Themain contributions of this study are summarized as follows: • A general framework for hashing self-improvement isproposed . The proposed FH method can be applied toexisting hashing methods without changing the objectivefunction of the original hashing method and results inbetter precision compared to the original hashing method. • Two hash code fusion strategies are proposed . In theproposed framework, two hash code fusion strategies areproposed and we perform theoretical analysis to guide thefusion process. Through the fusion of hash codes, we canlearn new hash functions for out-of-sample extension. • Experiments based on three large-scale datasets demon-strate that the proposed framework can improve differenttypes of hashing methods in terms of precision.The remainder of this paper is organized as follows. InSection 2, we describe the proposed FH method in detail.Evaluations based on experiments are presented in Section 3.We discuss the conclusions of our study in Section 4.II. P

ROPOSED M ETHOD

The proposed FH is a two-step framework that ﬁrst optimizesbinary codes utilizing hash code fusion, then estimates hashfunction parameters based on the optimized hash codes. Givenan existing hashing method, the proposed FH provides self-improvement capabilities. A ﬂowchart for the proposed FHframework is presented in Fig. 1. FH consists of hash codefusion and hash learning steps. In the hash code fusion step,we ﬁrst run a given hashing method T times to obtain T hashmatrices for all samples. We then fuse the T hash matricesutilizing two different fusion strategies. In the hash learningstep, we learn a simple linear hash function based on the fusedhash codes for out-of-sample extension.In the following subsections, we ﬁrst present some notationsfor FH and then describe the hash fusion and learning steps. A. Problem Statement and Notation

Generally, a hash function can have a form based on alinear projection, kernel, spherical function, neural network,etc. However, the linear function (or its variations, such askernel and bilinear functions) is one of the most popular hashfunction forms because it is very efﬁcient and easily optimized.Additionally, nearly all methods utilizing a linear hash functioncan be extended to kernelized hash functions [13]. Therefore,the theoretical analysis and hash learning methods proposedin this paper are largely based on linear hash functions.In this paper, boldface lowercase letters, such as h , denotevectors and boldface uppercase letters, such as P , denotematrices. Furthermore, || P || and P T are utilized to denote the (cid:96) -norm and transpose of a matrix P , respectively. Boldface denotes a vector where all elements are one. A few additionalnotations utilized in the proposed FH method are listed inTable I. TABLE IN

OTATIONS

Notation Description N number of samples L length of hash code T run times for a given hashing method H A a given hashing method b i the i th -row of the hash matrix B i hash matrix obtained from the i th run of hashing method H A B ﬁnal hash matrix X original feature matrix of size L × N P projection matrix between the hash matrix and feature matrix B. Hash Code Fusion

As discussed above, given a hashing method, we execute T times to get T hash codes { B i } Ti =1 for use as a trainingset. Next, we fuse these T hash codes into a ﬁnal hash code B for use as a training sample. The motivation for hash codefusion is twofold. First, more accurate and stable codes canbe obtained through hash code fusion. Second, synergy andrelationships between different hash codes can be exploitedthrough hash fusion. In this paper, we propose two fusionstrategies. To describe these strategies, we ﬁrst present somedeﬁnitions and theorems, then outline the speciﬁc processesfor the two fusion strategies.It is known that learning-based hashing attempts to pre-serve the similarity relationships between samples in theoriginal space based on Hamming distance. Therefore, differentobjective functions have been designed based on similaritypreservation. In such optimization problems, there is a trivialsolution in which all the hash codes of the samples are same(i.e., b = b = · · · = b N ). To avoid this solution, the codebalance condition was introduced in [13]. It states that thenumber of data items mapped to each hash code must bethe same. Bit balance and bit uncorrelation are utilized toapproximate the code balance condition. Bit balance meansthat each bit has an approximately 50% chance of being +1 or − . Bit uncorrelation means that different bits are uncorrelated.These two conditions are formulated as B1 = 0 , BB T = N I , (2)where is an N -dimensional all-ones vector and I is an identitymatrix of size N .The property of code balance has proved to be verysigniﬁcant for hashing [14] [15]. In this study, to evaluatebalance, we propose a deﬁnition called balance degree. Balance degree : Given a hash matrix B ∈ R L × N , thebalance degree of the i th bit for the samples is deﬁned asthe absolute value of the sum of the i th row in the hash matrix.For example, if the vector {− , , , } is the i th -row of thehash matrix, then the balance degree of the i th bit for thesamples is | − | = 2 . A smaller balance degreeindicates better code balance.We now present two theorems and their corresponding proofs,which are utilized in the proposed fusion strategies. XXXXX, VOL. ×× , NO. ×× , ×× × × ×× … … ………… … … … ……………… … ……………… … …………… … Hashing method

Running T Times

Fusion bit-by-bit

Fusion code-by-code … …… … …… … … Learning hash function

Samples

Learning hash function

Hash code fusion … …… … … … … … … ……… …… … …… … ……… … … …… … ……… … ……… … …… … …………… … … Hash learning … Fig. 1. Flowchart of the proposed FH method. The two branches represent two fusion strategies that are discussed below. The “hashing method” can bereplaced by other hashing algorithms.

Theorem 1 . Given a hash matrix B ∈ R L × N , duplicaterows can be removed from the hash matrix because they haveno inﬂuence on the preservation of semantics. Proof : Assume there are two hash matrices B =[ b ; b ; ... ; b L ] and B (cid:48) = [ b ; b ; ... ; b L ; b L ] . Compared to B , there is one duplicate row in B (cid:48) . One can see that B T B = B (cid:48) T B (cid:48) + const . That is to say, after adding a duplicaterow to the hash matrix B , the similarity between different hashcodes remained unchanged. Therefore, there is no inﬂuenceon the Hamming distance between different samples. In otherwords, duplicate hash bits can also be removed without anyinﬂuence on the preservation of semantics.Furthermore, assume P is a projection between an originalfeature and hash code. The loss for hash learning can be simplycalculated as follows: min P (cid:13)(cid:13) B − P T X (cid:13)(cid:13) + λ (cid:107) P (cid:107) s.t. B ∈ {− , } L × n . (3)Set the derivative of the objective function in Equation (3) w.r.t P to 0. Then, XX T P + λ P − XB T = 0 . (4)The closed-form solution of P can be derived as P = ( XX T + λ I ) − XB T . (5)For hash matrices B and B (cid:48) , we have P B = ( XX T + λ I ) − XB T and P (cid:48) B (cid:48) = ( XX T + λ I ) − XB (cid:48) T , respectively.We deﬁne Q = ( XX T + λ I ) − X . Then, we have P B = QB T and P (cid:48) B (cid:48) = QB (cid:48) T = [ P A ; Q b L ] . Finally, (cid:13)(cid:13)(cid:13) B (cid:48) − P (cid:48) B (cid:48) T X (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13) B − BQ T X (cid:13)(cid:13) + (cid:13)(cid:13) b L − b L Q T X (cid:13)(cid:13) = ( I + const ) (cid:13)(cid:13) B − BQ T X (cid:13)(cid:13) = ( I + const ) (cid:13)(cid:13)(cid:13) B − P B T X (cid:13)(cid:13)(cid:13) . (6)One can see that the ﬁtting error is unchanged.In conclusion, there is no inﬂuence on semantic preservationwhen duplicate rows are removed from the hash matrix. Theorem Given a hash matrix B ∈ R L × N , the hashbit rows of B can be out of order because ordering has noinﬂuence on semantic preservation. Proof : For the hash matrix B = [ b ; b ; ... ; b L ] , B (cid:48) is ahash matrix whose hash bit rows are a random permutation of B . One can see that B T B = B (cid:48) T B (cid:48) . Therefore, the semanticscan be preserved, even if the hash bits are out of order.Furthermore, according to Eqs. (3) and (4), for hash matrices B and B (cid:48) , we have P B = ( XX T + λ I ) − XB T and P (cid:48) B (cid:48) =( XX T + λ I ) − XB (cid:48) T , respectively. One can see that (cid:107) P B (cid:107) = (cid:107) P (cid:48) B (cid:48) (cid:107) . We deﬁne Q = ( XX T + λ I ) − X . Then, we have (cid:13)(cid:13)(cid:13) B − P B T X (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13) B − BQ T X (cid:13)(cid:13) = (cid:13)(cid:13) B ( I − Q T X ) (cid:13)(cid:13) = (cid:107) BW (cid:107) = trace ( W T B T BW )= trace ( B T BWW T ) , (7)where W = I − Q T X . Therefore, we have (cid:13)(cid:13)(cid:13) B − P B T X (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) B (cid:48) − P (cid:48) B (cid:48) T X (cid:13)(cid:13)(cid:13) because B T B = B (cid:48) T B (cid:48) . That is to say, XXXXX, VOL. ×× , NO. ×× , ×× × × ×× +1 1 1 +1 1 +11 1 +1 1 +1 +11 1 +1 +1 1 +1 − − −  − − −  − − −  B +1 +1 +1 1 +1 11 +1 1 1 1 11 1 +1 +1 1 +1 − −  = − − − − −  − − −  B −  = − − −  − − − −  B +1 1 1 +1 1 +1+1 1 +1 1 +1 +11 1 +1 1 1 1 − − −  = − −  − − − − −  B Fig. 2. Illustration of the bit-by-bit fusion strategy. The ﬁrst row of B comesfrom the ﬁrst row of B i , whose balance degree is the minimum one amongthe corresponding rows in all hash matrices. The second and third rows aresimilarly obtained. +1 +1 +1 1 +1 11 +1 1 1 1 11 1 +1 +1 1 +1 − −  = − − − − −  − − −  B −  = − − −  − − − −  B +1 1 1 +1 1 +1+1 1 +1 1 +1 +11 1 +1 1 1 1 − − −  = − −  − − − − −  B − − + − +  − − + − + +  + − − + − +  B Hash matrix

Balance degree2 M Balance degree + + + − + −  − + − − − −  − − + + − + − + + + + +  − − + − + + + − + − − −  + − − − + + − + − + +  − − + − − −  + + + − + −  − + − − − −  − − + + − + − + + + + +  − − + − + + + − + − − −  + − − − + + − + − + +  − − + − − −  +1 Fig. 3. Illustration of the code-by-code fusion strategy. We concatenate threehash matrices to obtain the matrix M , then select the three rows with theminimum balance degrees to construct the matrix B . (cid:13)(cid:13)(cid:13) B − P B T X (cid:13)(cid:13)(cid:13) + λ (cid:107) P B (cid:107) = (cid:13)(cid:13)(cid:13) B (cid:48) − P (cid:48) B (cid:48) T X (cid:13)(cid:13)(cid:13) + λ (cid:107) P (cid:48) B (cid:48) (cid:107) .One can see that the ﬁtting error remains unchanged.In conclusion, there is no inﬂuence on semantic preservationwhen the hash bit rows of B are out of order.We now propose two novel fusion strategies to obtain moreaccurate and stable hash codes for training samples.

1) Bit-by-bit fusion:

Given a hashing method H A , afterexecuting it T times in the training set, we obtain T differenthash matrices { B i } Ti =1 ∈ R L × N , where B i = ( b il ) Ll =1 and b il ∈ R × N . One can see that b il is the l th -row of hash matrix B i .The goal of hash fusion is to obtain an accurate and stablehash matrix B for all training samples from the T hash matrices { B i } Ti =1 . Here, we propose a bit-by-bit fusion strategy basedon the code balance condition. For the l th -bit (i.e., the l th -rowin B ) of all training samples, we ﬁrst compute the balancedegrees of the l th -bit in all hash matrices { B i } Ti =1 , then selectthe row whose balance degree is the smallest among all hashmatrices { B i } Ti =1 , meaning we ﬁnd the most balanced bit rowamong all hash matrices. If there are two or more rows with thesame minimum balance degree, we empirically select the rowin the ﬁrst hash matrix. It should be noted that this phenomenonis rarely seen when the number of samples is large. We repeatthis process for all L rows to obtain a ﬁnal hash matrix B .Additionally, if there are duplicate rows in hash matrix B , theyare removed according to Theorem 1 to obtain more compact hash codes.To demonstrate the bit-by-bit strategy, we presented anexample in Fig. 2, where T =3, L =3, and N =6. For each rowin the three hash matrices { B i } i =1 , we compute a balancedegree. The row with the smallest balance degree among thethree hash matrices { B i } i =1 is selected as a row of B .

2) Code-by-code fusion:

In this strategy, given T hashmatrices { B i } Ti =1 , in contrast to bit-by-bit fusion, we randomlyconcatenate all hash matrices in the column direction. Accord-ing to Theorem 2, random ordering has no impact on semanticpreservation and we obtain a new matrix M ∈ R T L × N . If wewish to obtain a hash code of length L , we then select theﬁrst L rows from matrix M with minimum balance degrees toconstruct the ﬁnal hash matrix B . If there are duplicate rowsin hash matrix B , they are removed according to Theorem 1to obtain more compact hash codes.We present an example in Fig. 3, where T =3, L =3, and N =6. In this example, we ﬁrst concatenate the hash matrices B , B , and B in the column direction, then select the ﬁrstthree rows with minimum balance degrees to construct theﬁnal hash matrix B .

3) Discussion:

To adequately describe the motivation forhash fusion strategies, we can consider a hash bit as a binaryfeature of the training samples, where a more balanced bitrepresents a better feature. Therefore, the goal of the twoproposed fusion strategies is to replace bad binary featureswith good binary features. Additionally, according to Theorem2, we can neglect the order of the hash bits. Therefore, goodbinary features (which have minimum balance degrees) can besampled more than once, which is another motivation for thecode-by-code fusion strategy.

C. Hash Learning

After obtaining a desirable hash matrix of training samples,we then learn a hash function for out-of-sample inputs.Generally, any type of hash function, such as a kernel, sphericalfunction, neural network, or nonparametric function, can beutilized in this step. However, we utilized a linear hash functionin this study. A simple form of the relevant optimizationproblem can be written as follows: min P (cid:13)(cid:13) B − P T X (cid:13)(cid:13) + λ (cid:107) P (cid:107) . (8)The matrix B is obtained following hash code fusion and thesolution of P can be easily obtained by utilizing Eqs. (4) and(5). Based on the learned projection P , we can obtain hashcodes for out-of-sample inputs by utilizing a sign function.In summary, the proposed FH is presented in Algorithm 1. D. Time complexity analysis

We assume that the time complexity for a given hashalgorithm H A is C . Then, the time complexity for generating T hash codes is T · C . Additionally, the time complexityfor solving the linear projection P is O ( N dL + N d ) ,where d is the dimension of the input features. For hashfusion, the time complexity depends on the fusion strategy.In the balance degree sorting process, the time complexity XXXXX, VOL. ×× , NO. ×× , ×× × × ×× Algorithm 1

Fusion Hashing (FH)

Input:

Training sets X ; hash code length L ; given hashingalgorithm H A ; number of iterations T . Initialize P as a random matrix. Execute hashing algorithm H A T times to get T hashmatrices. Fuse T hash matrices to get a ﬁnal hash matrix B utilizingbit-by-bit fusion or code-by-code fusion. Utilize Eq. (5) to solve P . Output:

Projection matrix P .is typically no greater than O ( T LlgL ) . In the process ofbalance degree computation, the time complexity is O ( N T L ) .Therefore, the time complexity for FH during training is O ( N dL + N d + T LlgL + N T L ) + T · C . Because L and T are much smaller than d and N , the time complexity forFH during training can be rewritten as O ( N d ) + T · C . Ingeneral, compared to the time complexity of the hash algorithm H A , the time complexity for learning the linear projection P is negligible. Therefore, the proposed FH method is only T times as complex as the original hashing method, but achievessuperior precision. Additionally, the value T is always smallbecause we found that a small T value is acceptable basedon our experiments. Precision only increases very slowly withan increase in T . Therefore, the proposed framework doesnot require signiﬁcant extra expenditures in terms of time andspace to achieve superior precision.III. E XPERIMENTS

In this section, we present our experimental settings andresults. Three image datasets were utilized to evaluate theperformance of the proposed method. Extensive experimentswere conducted to evaluate the proposed framework. Our experi-ments were conducted on a computer with an Intel(R) Core(TM)i7-4790 CPU and 16 GB of RAM. The hyperparameter settingsemployed are listed in the experimental settings section.

A. Experimental Settings1) Datasets:

We utilized three different image datasets,namely CIFAR-10 [16], MS-COCO [17], and NUS-WIDE [18],in our experiments. These datasets are widely used in imageretrieval studies. CIFAR-10 is a single-label dataset containing60,000 images that belong to 10 classes, with 6,000 imagesper class. We randomly selected 5,000 and 1,000 images (100images per class) from the dataset as our training and testingsets, respectively.The MS-COCO dataset is a multi-label dataset containing82,783 images that belong to 91 categories. For the trainingimage set, images with no category information were discardedand 82,081 remained. For the MS-COCO dataset, two imageswere deﬁned as a similar pair if they shared at least onecommon label. We randomly selected 10,000 and 5,000 imagesfrom the dataset as our training and testing sets, respectively.The NUS-WIDE dataset contains 269,648 web imagesassociated with 1,000 tags. In this multi-label dataset, eachimage may be annotated with multiple labels. We only selected 195,834 images belonging to the 21 most frequent concepts.For the NUS-WIDE dataset, two images were deﬁned as asimilar pair if they shared at least one common label. Werandomly selected 10,500 (500 from each concept) and 2,100(100 from each concept) images from the dataset as our trainingand testing sets, respectively.In this study, we employed a convolutional neural network(CNN) model called the CNN-F model [19] to performfeature learning. The CNN-F model has also been appliedin deep pairwise-supervised hashing [20]and asymmetric deepsupervised hashing [21] for feature learning. The CNN-F modelcontains ﬁve convolutional layers and three fully-connectedlayers. Their details are provided in [19]. It should be notedthat the FH framework is sufﬁciently general to allow otherdeep neural networks to replace the CNN-F model for featurelearning. In this study, we only employed the CNN-F modelfor illustrative purposes. Additionally, a radial basis functionwas utilized to reduce the number of parameters. The 4,096deep features extracted by the CNN-F model were mapped to1,000 features.

2) Evaluation Metrics:

To evaluate the proposed method, weutilized an evaluation metric known as mean average precision(MAP), which is used widely in image retrieval evaluation.MAP is the mean of the average precision values obtained forthe top retrieved samples.MAP = 1 Q Q (cid:88) r =1 AP ( i ) , (9)where Q is the number of query images and AP ( i ) is the APof the i th instance. AP is deﬁned asAP = 1 R G (cid:88) r =1 precision ( r ) σ ( r ) , (10)where R is the number of relevant instances in the retrieved G samples. Here, σ ( r ) = 1 if the r th instance is relevant tothe query. Otherwise, σ ( r ) = 0. B. Experimental Results and Analysis

We applied the proposed FH framework to the followingmethods: LSH [22], spectral hashing (SH) [23], principlecomponent analysis (PCA)-iterative quantization (PCA-ITQ)[24], PCA-random rotation [24], supervised discrete hashing(SDH) [3], column sampling based discrete supervised hashing[1], and fast supervised discrete hashing (FSDH) [25]. LSHis a data-independent method. SH, PCA-ITQ, and PCA-RRare unsupervised hashing methods. All the other methodsare supervised hashing methods. All of the hyperparameterswere initialized as suggested in the original publications. Theproposed FH is a data-dependent framework. However, theproposed FH can be applied to data-independent method suchas LSH.The proposed FH with the bit-by-bit strategy is denotedFHBB, whereas FH with the code-by-code strategy is denotedFHCC. We executed the original methods three times each. Inother words, we set T = 3 .Table II lists the MAP scores of the data-independent LSHmethod with hash lengths ranging from 24–64 bits. One can see XXXXX, VOL. ×× , NO. ×× , ×× × × ×× TABLE IIP

ERFORMANCE IN TERMS OF

MAP

SCORE WITH DATA INDEPENDENT METHOD .Method CIFAR-10 MS-COCO NUS-WIDE24 bits 48 bits 64 bits 24 bits 48 bits 64 bits 24 bits 48 bits 64 bitsLSH 0.2604 0.2942 0.3101 0.6093 0.7121 0.7145 0.4095 0.5968 0.5934LSH 0.2600 0.2901 0.3027 0.6338 0.6715 0.6558 0.5407 0.6109 0.5903LSH 0.2704 0.2908 0.2898 0.6404 0.6548 0.7051 0.4754 0.5814 0.5969FHBB

FHCC

TABLE IIIP

ERFORMANCE IN TERMS OF

MAP

SCORE WITH UNSUPERVISED METHODS .Method CIFAR-10 MS-COCO NUS-WIDE24 bits 48 bits 64 bits 24 bits 48 bits 64 bits 24 bits 48 bits 64 bitsPCA-ITQ 0.3418 0.3502 0.3547 0.6273 0.6967 0.7166 0.4048 0.6169 0.6390PCA-ITQ 0.3429 0.3539 0.3555 0.6276 0.6917 0.7154 0.4054 0.6171 0.6471PCA-ITQ 0.3354 0.3461 0.3521 0.6330 0.6907 0.7152 0.4057 0.6109 0.6382FHBB 0.3172

FHCC 0.3296

PCA-RR 0.2987 0.3114 0.3222 0.6596 0.6781 0.6970 0.4596 0.5941 0.5961PCA-RR 0.2787 0.3221 0.3285 0.6205 0.6865 0.6863 0.5001 0.6059 0.5573PCA-RR 0.3017 0.3204 0.3235 0.5561 0.7194 0.7397 0.4960 0.5990 0.6238FHBB

FHCC

SH 0.2908 0.2961 0.2992 0.6616 0.6501 0.6659 0.6070 0.5986 0.5934SH 0.2908 0.2961 0.2992 0.6616 0.6501 0.6659 0.6070 0.5986 0.5934SH 0.2908 0.2961 0.2992 0.6616 0.6501 0.6659 0.6070 0.5986 0.5934FHBB

FHCC 0.2689 0.2665 0.2760 0.5536 0.6051 0.6343 0.5210 0.5521

TABLE IVP

ERFORMANCE IN TERMS OF

MAP

SCORE WITH SUPERVISED METHODS .Method CIFAR-10 MS-COCO NUS-WIDE24 bits 48 bits 64 bits 24 bits 48 bits 64 bits 24 bits 48 bits 64 bitsSDH 0.2333 0.4622 0.4007 0.8026 0.8413 0.5948 0.6837 0.7045 0.7036SDH 0.2236 0.5346 0.4797 0.6280 0.7204 0.8452 0.6888 0.7611 0.6844SDH 0.2640 0.4322 0.3089 0.8019 0.5156 0.8219 0.5004 0.6683 0.6926FHBB 0.2013 0.5004 0.3618

FHCC 0.2124 0.4090 0.2932

COSDISH 0.4566 0.5034 0.5269 0.5082 0.5900 0.6563 0.3633 0.4192 0.4007COSDISH 0.4795 0.5034 0.5143 0.5850 0.6890 0.6505 0.4351 0.4830 0.4531COSDISH 0.4817 0.5268 0.5184 0.4967 0.6006 0.6078 0.4452 0.4555 0.4894FHBB

FHCC

FSDH 0.6444 0.6798 0.6838 0.8122 0.8246 0.8209 0.7750 0.7756 0.7866FSDH 0.6443 0.6687 0.6872 0.7810 0.8232 0.8371 0.7750 0.7865 0.7878FSDH 0.6324 0.7006 0.7019 0.8151 0.8218 0.8234 0.7705 0.7798 0.7873FHBB

FHCC

XXXXX, VOL. ×× , NO. ×× , ×× × × ×× that the performance was improved by applying the proposedFH to LSH on all three benchmark datasets. FHBB and FHCCresulted in similar performance and both achieved 4% − method − abbreviation i represents the i th running for themethod, such as LSH . Five methods were selected for testedand we executed the original methods three times. One can seethat the proposed FHBB and FHCC almost produce superiorprecision, except for FHBB with the SH method.Fig. 5 and Fig. 6 show the Precision@ 5000 on threebenchmark datasets with the hash length ranging from 24–128bits and the number of running times ranging from 2–6 bitsby using the fusion strategy FHBB and FHCC, respectively.Five methods are selected limited by the space. It can be seenthat the precisions of hashing methods are improved by usingthe proposed fusion strategies with the number of runningtimes and hash bit being bigger. However, we can see thatthe precision increases slowly with bigger number of runningtimes, which indicates that the proposed framework does notneed too much extra expenditure in term of time and space toget superior precision.IV. C ONCLUSION

In this study, we proposed a general framework calledFH to facilitate the self-improvement of various hashingmethods. Generally, the proposed framework can be appliedto existing hashing methods without adding new constraintterms. In the proposed framework, we implemented twofusion strategies to obtain more accurate and stable hashcodes from a given original hashing method. We then learn asimple linear projection for out-of-sample inputs. Experiments conducted on three benchmark datasets demonstrated thesuperior performance of the proposed framework.R

EFERENCES[1] W.-C. Kang, W.-J. Li, and Z.-H. Zhou, “Column sampling based discretesupervised hashing.” in

AAAI , 2016, pp. 1230–1236.[2] Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang, “Discriminativecoupled dictionary hashing for fast cross-media retrieval,” in

Proceedingsof the 37th international ACM SIGIR conference on Research &development in information retrieval . ACM, 2014, pp. 395–404.[3] F. Shen, C. Shen, W. Liu, and H. Tao Shen, “Supervised discrete hashing,”in

Proceedings of the IEEE Conference on Computer Vision and PatternRecognition , 2015, pp. 37–45.[4] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in

InternationalConference on Neural Information Processing Systems , 2008, pp. 1753–1760.[5] M. Raginsky and S. Lazebnik, “Locality-sensitive binary codes fromshift-invariant kernels,” in

Advances in neural information processingsystems , 2009, pp. 1509–1517.[6] A. Dasgupta, R. Kumar, and T. Sarl´os, “Fast locality-sensitive hashing,”in

Proceedings of the 17th ACM SIGKDD international conference onKnowledge discovery and data mining . ACM, 2011, pp. 1073–1081.[7] J. Ji, J. Li, S. Yan, Q. Tian, and B. Zhang, “Min-max hash for jaccardsimilarity,” in

IEEE International Conference on Data Mining , 2014, pp.301–309.[8] J. Wang, S. Kumar, and S.-F. Chang, “Semi-supervised hashing forlarge-scale search,”

IEEE Transactions on Pattern Analysis and MachineIntelligence , vol. 34, no. 12, pp. 2393–2406, 2012.[9] W. Liu, J. Wang, R. Ji, and Y. G. Jiang, “Supervised hashing with kernels,”in

Computer Vision and Pattern Recognition , 2012, pp. 2074–2081.[10] G. Lin, C. Shen, and A. van den Hengel, “Supervised hashing usinggraph cuts and boosted decision trees,”

IEEE transactions on patternanalysis and machine intelligence , vol. 37, no. 11, pp. 2317–2331, 2015.[11] Q. Wang, Z. Zhang, and L. Si, “Ranking preserving hashing for fastsimilarity search.” in

IJCAI , 2015, pp. 3911–3917.[12] J. Gui, T. Liu, Z. Sun, D. Tao, and T. Tan, “Supervised discrete hashingwith relaxation,”

IEEE transactions on neural networks and learningsystems , 2016.[13] J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen, “A survey onlearning to hash,” arXiv preprint arXiv:1606.00185 , 2016.[14] F. Shen, X. Gao, L. Liu, Y. Yang, and H. T. Shen, “Deep asymmetricpairwise hashing,” in

Proceedings of the 2017 ACM on MultimediaConference . ACM, 2017, pp. 1522–1530.[15] Q. Y. Jiang and W. J. Li, “Deep cross-modal hashing,” in arxiv , 2016.[16] A. Krizhevsky and G. Hinton, “Learning multiple layers of features fromtiny images,” Citeseer, Tech. Rep., 2009.[17] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,P. Doll´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,”in

European conference on computer vision . Springer, 2014, pp. 740–755.[18] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “Nus-wide:a real-world web image database from national university of singapore,”in

Proceedings of the ACM international conference on image and videoretrieval . ACM, 2009, p. 48.[19] K. Chatﬁeld, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return ofthe devil in the details: Delving deep into convolutional nets,” arXivpreprint arXiv:1405.3531 , 2014.[20] W.-J. Li, S. Wang, and W.-C. Kang, “Feature learning based deepsupervised hashing with pairwise labels,”

In IJCAI , 2017.[21] Q.-Y. Jiang and W.-J. Li, “Asymmetric deep supervised hashing,”

Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence(AAAI), 2018 , 2018.[22] A. Gionis, P. Indyk, and R. Motwani, “Similarity search in highdimensions via hashing,”

International Conference on Very Large DataBases , vol. 8, no. 2, pp. 518–529, 1999.[23] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in

Advancesin neural information processing systems , 2009, pp. 1753–1760.[24] Y. Gong and S. Lazebnik, “Iterative quantization: A procrustean approachto learning binary codes,” in

IEEE Conference on Computer Vision andPattern Recognition , 2011, pp. 817–824.[25] J. Gui, T. Liu, Z. Sun, D. Tao, and T. Tan, “Fast supervised discretehashing,”

IEEE transactions on pattern analysis and machine intelligence ,vol. 40, no. 2, pp. 490–496, 2018.

XXXXX, VOL. ×× , NO. ×× , ×× × × ×× Number of Hash Bit P rec i s i o n @ LSH LSH LSH FHBBFHCC (a) Based on CIFAR-10

Number of Hash Bit P rec i s i o n @ LSH LSH LSH FHBBFHCC (b) Based on MS-COCO

Number of Hash Bit P rec i s i o n @ LSH LSH LSH FHBBFHCC (c) Based on NUS-WIDE

Number of Hash Bit P rec i s i o n @ PCARR PCARR PCARR FHBBFHCC (d) Based on CIFAR-10

Number of Hash Bit P rec i s i o n @ PCARR PCARR PCARR FHBBFHCC (e) Based on MS-COCO

Number of Hash Bit P rec i s i o n @ PCARR PCARR PCARR FHBBFHCC (f) Based on NUS-WIDE

Number of Hash Bit P rec i s i o n @ SH SH SH FHBBFHCC (g) Based on CIFAR-10

Number of Hash Bit P rec i s i o n @ SH SH SH FHBBFHCC (h) Based on MS-COCO

Number of Hash Bit P rec i s i o n @ SH SH SH FHBBFHCC (i) Based on NUS-WIDE

Number of Hash Bit P rec i s i o n @ COSDISH COSDISH COSDISH FHBBFHCC (j) Based on CIFAR-10

Number of Hash Bit P rec i s i o n @ COSDISH COSDISH COSDISH FHBBFHCC (k) Based on MS-COCO

Number of Hash Bit P rec i s i o n @ COSDISH COSDISH COSDISH FHBBFHCC (l) Based on NUS-WIDE

Number of Hash Bit P rec i s i o n @ FSDH FSDH FSDH FHBBFHCC (m) Based on CIFAR-10

Number of Hash Bit P rec i s i o n @ FSDH FSDH FSDH FHBBFHCC (n) Based on MS-COCO

Number of Hash Bit P rec i s i o n @ FSDH FSDH FSDH FHBBFHCC (o) Based on NUS-WIDEFig. 4. Precision@ 5000 with different number of hash bit based on three benchmark datasets.

XXXXX, VOL. ×× , NO. ×× , ×× × × ×× P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (a) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (b) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (c) Based on NUS-WIDE P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (d) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (e) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (f) Based on NUS-WIDE P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (g) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (h) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (i) Based on NUS-WIDE P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (j) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (k) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (l) Based on NUS-WIDE P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (m) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (n) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (o) Based on NUS-WIDEFig. 5. Precision@ 5000 with different setting of number of hash bit and running times based on three benchmark datasets using FHBB. (From top to bottom:LSH, PCARR, SH, COSDISH, FSDH). XXXXX, VOL. ×× , NO. ×× , ×× × × ×× P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (a) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (b) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (c) Based on NUS-WIDE P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (d) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (e) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (f) Based on NUS-WIDE P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (g) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (h) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (i) Based on NUS-WIDE P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (j) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (k) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (l) Based on NUS-WIDE P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (m) Based on CIFAR-10 P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t (n) Based on MS-COCO P rec i s i o n @ N u m b er o f R unn i n g N u m b e r o f H a s h B i t5 646 4824