TTSINGHUA SCIENCE AND TECHNOLOGYISSNll1007-0214 0?/?? pp???–???DOI: 1 0 . 2 6 5 9 9 / T S T . 2 0 1 8 . 9 0 1 0 0 0 0Volume 1, Number 1, Septembelr 2018
Ranking with Adaptive Neighbors
Muge Li, Liangyue Li, and Feiping Nie ∗ Abstract:
Retrieving the most similar objects in a large-scale database for a given query is a fundamental buildingblock in many application domains, ranging from web searches, visual, cross media, and document retrievals. State-of-the-art approaches have mainly focused on capturing the underlying geometry of the data manifolds. Graph-based approaches, in particular, define various diffusion processes on weighted data graphs. Despite success,these approaches rely on fixed-weight graphs, making ranking sensitive to the input affinity matrix. In this study,we propose a new ranking algorithm that simultaneously learns the data affinity matrix and the ranking scores.The proposed optimization formulation assigns adaptive neighbors to each point in the data based on the localconnectivity, and the smoothness constraint assigns similar ranking scores to similar data points. We developa novel and efficient algorithm to solve the optimization problem. Evaluations using synthetic and real datasetssuggest that the proposed algorithm can outperform the existing methods.
Key words:
Ranking; Adaptive neighbors; Manifold structure
Retrieving the most similar objects in a large-scaledatabase for a given query is a fundamental buildingblock in many application domains, ranging fromweb search [1], visual retrieval [2–6], cross mediaretrieval [7], to document retrieval [8]. The moststraightforward approach to such retrieval tasks isto compute the pairwise similarities between objectsin the Euclidean space as the ranking scores. • Muge Li is with Cixi Hanvos Yucai High School, Ningbo,China, 315300. E-mail: [email protected]. • Liangyue Li is with with the School of Computing,Informatics, Decision Systems Engineering, ArizonaState University, Tempe, AZ, US, 85281. E-mail:[email protected]. • Feiping Nie is with the School of Computer Science and Centerfor OPTical IMagery Analysis and Learning (OPTIMAL),Northwestern Polytechnical University, Xi?an, China, 710072.E-mail: [email protected]. ∗ To whom correspondence should be addressed.Manuscript received: 2017-06-25; accepted: 2017-08-24
Nonetheless, high-dimensional data often lie on anonlinear manifold [9, 10]. The Euclidean distancebased approach largely ignores the intrinsic manifoldstructure and might degrade the retrieval performance.State-of-the-art methods mainly focus on capturingthe underlying geometry of the data manifold. Themost common way is to first represent the data manifoldusing a weighted graph, wherein each vertex is a dataobject, and the edge weights are proportional to thepairwise similarities. All the vertices then repeatedlyspread their affinities to their neighborhood via theweighted graph until a global stable state is reached.The various diffusion processes mainly differ in thetransition matrix and the affinity update scheme [5].Among others, the random walk transition matrixis widely used in PageRank [1], random walk withrestart [11], self diffusion [12], label propagation [13]and graph transduction [14]. The random walktransition matrix is a row-stochastic matrix such thatthe transition probability is proportional to the edgeweights.A slight variant is the symmetric normalizedtransition matrix used in the Ranking on DataManifold method [15]. To reduce the effect ofnoisy nodes, random walks can be restricted to the k a r X i v : . [ c s . L G ] M a r Tsinghua Science and Technology, December nearest neighbors by sparsifying the original weightedgraph [16, 17]. For iterative update of the affinities, therandom walk with restart allows for the random surferto randomly jump to an arbitrary node. The modifieddiffusion process on the standard graph captures thehigh-order relations [17] and is equivalent to thediffusion process on the Kronecker product graph [18].Despite success, graph-based ranking methods relyon fixed-weight graphs, making the ranking resultssensitive to the input affinity matrix.In this study, we propose the ranking with adaptiveneighbors (RAN) algorithm simultaneously learns thedata affinity matrix and the ranking scores. Theproposed optimization explores two objectives. First,data points with smaller distance in the Euclidean spacehave high chance to be neighbors, i.e., more similar.In contrast to other graph-based ranking methods, thesimilarity is not computed a priori but is learnedvia optimizing the ranking scores. Consequently,the neighbors of each datum are adaptively assigned.Second, similar data points have similar ranking scores.This is essentially the smoothness constraint in graphtransduction methods [19]. We develop a novel andefficient algorithm to solve the optimization problem.Evaluations using synthetic and real datasets suggestthat the proposed ranking algorithm outperformsexisting methods.In section 2, we present the proposed RAN algorithm.Next, in section 3 we discuss the empirical evaluationresults and, in section 4, we summarize the conclusions.
Notations:
Throughout the paper, the matrices arewritten as upper-case letters. For matrix M , the i -throw and ( i, j ) -th element of M are denoted by m i and m ij , respectively. An identity matrix is denoted by I ,and denotes the column vector with all elements asone. For vector v and matrix M , v ≥ and M ≥ represent all the elements of v and M are nonnegative. In this section, we discuss RAN algorithm and thenthe optimization approach for solving the objectivefunction.
Given a set of data points X = { x , x , . . . , x N } ⊆ R d with a query indicator vector y = [ y , y , . . . , y N ] T ∈{ , } N , where y = 1 if x i is the query and y = 0 otherwise, the task is to find a function f that assignseach point in the data x i a ranking score f i ∈ R according to its relevance to the queries. We explore thelocal connectivity of each point for ranking purposesand in particular consider the k -nearest points as theneighbors of a specific node.Data points separated by small distances in theEuclidean space have high chance to be neighbors. Wedenote the probability that the i -th data point x i , andthe j -th data point x j are neighbors by s ij . Intuitively,if the two data points are separated by a small distance,i.e., (cid:107) x i − x j (cid:107) is small, then their probability s ij of being connected is likely high. One way to findsuch probabilities s ij | Nj =1 is to solve the followingoptimization problem: min s Ti =1 , ≤ s i ≤ N (cid:88) j =1 (cid:107) x i − x j (cid:107) s ij (1)where s i ∈ R N is a vector with the j -th element as s ij . Nonetheless, the above optimization problem has atrivial solution, that is, s ij = 1 for the nearest data point x j of x i , otherwise s ij = 0 . This can be addressed byadding a l -norm regularization on s i to drag s i closerto the center of mass of the simplex defined by s Ti =1 , ≤ s i ≤ . This slight modification gives us thefollowing optimization problem: min s Ti =1 , ≤ s i ≤ N (cid:88) j =1 ( (cid:107) x i − x j (cid:107) s ij + γs ij ) (2)where the second term is the regularization term and γ is the regularization parameter.For each data point x i , we compute its probabilityof connecting to other data points using Eq. (2). As aresult, we assign the neighbors of all the data points bysolving the following problem: min ∀ i,s Ti =1 , ≤ s i ≤ N (cid:88) i,j =1 ( (cid:107) x i − x j (cid:107) s ij + γs ij ) (3)Similar data points have similar ranking scores,essentially a smoothness constraint over the data graph.We assume the matrix S ∈ R N × N is the similaritymatrix obtained from assigning the neighbors, whereeach row is s Ti . We write the smoothness constraint as, N (cid:88) i,j =1 ( f i − f j ) s ij = 2 f T L S f (4)where f is the vector of ranking scores for all the datapoints, L S = D S − S T + S is the Laplacian matrixof the affinity matrix, and the degree matrix D S is adiagonal matrix with the i -th diagonal element definedas (cid:80) j ( s ij + s ji ) / .Combining the above and using the information from uge Li et al.: Ranking with Adaptive Neighbors the query, we derive the final objective function: min S,f n (cid:88) i,j =1 ( (cid:107) x i − x j (cid:107) s ij + γs ij ) + 2 λf T L S f + ( f − y ) T U ( f − y ) s.t. ∀ i, s Ti = 1 , ≤ s i ≤ (5)where U is a diagonal matrix with U ii = ∞ (a largeconstant) if x i is the query, otherwise U ii = 1 . The lastterm is equivalent to (cid:80) ni =1 U ii ( f i − y i ) to make theranking results consistent with the queries. The queriesare given much more weights as they reflect the user’ssearch intentions. In non-queried examples, we do notknow a priori whether they meet the user’s intentionsand give them lower weights. It is not easy to solveEq. (5) because L S = D S − S T + S and D S both dependon the similarity matrix S . In the next subsection, wepropose a novel and efficient algorithm to solve thisproblem. We propose to solve Eq. (5) via an alternativeoptimization approach. We first fix S and then theproblem transforms to: min f λf T L S f + ( f − y ) T U ( f − y ) (6)We take the derivative of the above objective functionw.r.t. f and set it to 0, obtaining the following linearequation: (2 λL S + U ) f = U y (7)The solution is easily obtained as f = (2 λL S + U ) − U y .When f is fixed, Eq. (5) transforms to: min S n (cid:88) i,j =1 ( (cid:107) x i − x j (cid:107) s ij + γs ij ) + 2 λf T L S f (8) s.t. ∀ i, s Ti = 1 , ≤ s i ≤ (9)And based on Eq. (4), it is written min S n (cid:88) i,j =1 ( (cid:107) x i − x j (cid:107) s ij + γs ij + λ ( f i − f j ) s ij ) s.t. ∀ i, s Ti = 1 , ≤ s i ≤ (10)Because the summations are independent of each othergiven i , we can solve the following sub-problemindividually for each i : min s i n (cid:88) j =1 ( (cid:107) x i − x j (cid:107) s ij + γs ij + λ ( f i − f j ) s ij ) s.t.s Ti = 1 , ≤ s i ≤ (11) We denote d xij = (cid:107) x i − x j (cid:107) and d fij = ( f i − f j ) ,and denote d i ∈ R N as a vector with the j -th elementas d ij = d xij + λd fij . Then Eq. (11) is reformulated as: min s Ti =1 , ≤ s i ≤ (cid:107) s i + d i γ (cid:107) (12)Next, we will show how to solve this equation ina closed form using the Lagrange multipliers method.The Lagrangian function of the problem is L ( s i , η, β i ) = 12 (cid:107) s i + d i γ i (cid:107) − η ( s Ti − − β Ti s i (13)where η and β i are non-negative Lagrangianmultipliers.According to the KKT condition, the optimal solutionis s ij = ( − d ij γ i + η ) + (14)where ( x ) + is the shorthand for max { x, } .It is often desirable to focus on the locality of eachpoint, as it can reduce the effect of noisy data and boostthe performance in practice [20]. In this study, we willlearn the sparse vector s i and allow x i to connect toits k -nearest neighbors. Such sparsification of S wouldminimize the computational cost.We sort d ij in ascending order such that d i ≤ d i ≤ . . . ≤ d iN . We want to learn the sparse s i with only k nonzero elements, from Eq. (14); thus we have s ik > and s i,k +1 = 0 . Therefore (cid:40) − d ik γ i + η > − d i,k +1 γ i + η ≤ (15)Considering the constraint s Ti = 1 , we obtain k (cid:88) j =1 ( − d ij γ i + η ) = 1 ⇒ η = 1 k + 12 kγ i k (cid:88) j =1 d ij (16)Substituting Eq. (16) into Eq. (15), we obtain thefollowing inequality for γ i k d ik − k (cid:88) j =1 d ij < γ i ≤ k d i,k +1 − k (cid:88) j =1 d ij (17)For the objective function in Eq. (12) to have anoptimal solution s i , we set γ i to γ i = k d i,k +1 − k (cid:88) j =1 d ij (18)The overall γ is set as the mean of all γ i : γ = 1 n n (cid:88) i =1 ( k d i,k +1 − k (cid:88) j =1 d ij ) (19)The algorithm for solving the optimization problemin Eq. (5) is summarized in Algorithm 1. Tsinghua Science and Technology, December
Algorithm 1
Algorithm to solve problem in Eq. (5)
Input: (1) Data matrix X ∈ R n × d ,(2) Query indicator vector y ,(3) parameters γ , λ . Output:
The ranking scores f . Initialize S and compute L S accordingly; while not converged do Define the diagonal matrix U as: U ii = ∞ if y i = 1 and U ii = 1 otherwise; Update f by solving Eq. (7) as f = (2 λL S + U ) − Uy ; for i = 1 , . . . , N do Update i -th row of S by solving Eq. (12) end for end while In this section, we show the performance of theproposed ranking algorithm RAN (Algorithm 1) onsynthetic and real world datasets.
We randomly generate two synthetic datasetsconstructed as two moons (Fig. 1) and three rings(Fig. 2) patterns. A query is given in the upper moonand the innermost ring marked in red cross. The taskis to rank the remaining data points according to theirrelevance to the query. We represent the ranking scoresreturned by RAN using the diameter of the data pointssuch that larger points are more relevant. From Fig. 1,we observe that the ranking scores gradually decreasealong the upper moon. The same decreasing trendis also observed in the lower moon. In addition, theranking scores in the upper moon are generally muchhigher than in the lower moon. Such ranking outcomeis intuitively expected. We make similar observationsfor the three rings in Fig. 2. The data points in theinnermost ring are more relevant than those in themiddle ring, which are more relevant than those in theoutermost ring. These results clearly show that theproposed RAN can capture the underlying manifoldpretty well.
We compare the retrieval performance on three realimage datasets: Yale [21], ORL [22] and USPS [23].
YALE:
Yale contains face images of subjects atdifferent poses and illumination conditions. We extract11 images at different conditions for 15 subjects. Eachimage is down-sampled and normalized to zero meanand unit variance. The bandwidth for constructing the
Fig. 1
Ranking Example using Two Moon.
Fig. 2
Ranking Example using Three Ring. weighted graph for the graph based baselines is σ =0 . . We set k = 5 and λ = 90 for RAN. ORL:
ORL contains contains 400 images with tendifferent images for 40 different subjects each. Thebandwidth for constructing the weighted graph for thegraph based baselines is σ = 20 . We set k = 5 and λ = 0 . for RAN. USPS:
This dataset collects images of handwrittendigits (0-9) from envelopes of the U.S. Postal Service.We extract 40 images for each digit and normalize themto 16 ×
16 pixels in gray scale. The bandwidth forconstructing the weighted graph for the graph basedbaselines is σ = 0 . . We set k = 10 and λ = 1 . for RAN.On all the datasets, we use each image as queryand measure the retrieval accuracy by ranking allthe other images. We compare the proposed RANalgorithm with the Euclidean distance based baselineand several other diffusion methods, including self- uge Li et al.: Ranking with Adaptive Neighbors diffusion (SD) [12], Personalized PageRank (PPR) [24],Manifold Ranking [15] and Graph Transduction(GT) [14]. The results are shown in Tables 1, 2 and3. From the results, we can see that the proposed RANalgorithm consistently outperforms all other methods.The straightforward Euclidean distance based baselineis the worst because it ignores the manifold structurein the data. The various diffusion based methodscapture the manifold information to a certain extent,but they assume the weighted data graph is fixed. Weinstead adaptively learn the localized weighted graphoptimized for the ranking. To study how the localityof the graph, i.e., the number of neighbors k , affectsthe retrieval performance, we show (Fig. 3) the retrievalperformance by varying the number of neighbors onUSPS dataset. As it can be seen, it is important to selecta reasonable value for k for the retrieval. For USPS, thebest performance can be achieved at k = 15 . Table 1
Retrieval performance (%) for YALE.Methods Precision@10 Recall@10Euclidean Distance 66.61 60.55SD [12] 69.03 62.75PPR [24] 69.03 62.75Manifold Ranking [15] 68.85 62.59GT [14] 68.91 62.65RAN (ours)
Retrieval performance (%) for ORL.Methods Precision@15 Recall@15Euclidean Distance 41.56 62.35SD [12] 46.87 70.30PPR [24] 47.15 70.73Manifold Ranking [15] 47.35 71.02GT [14] 48.97 73.45RAN (ours)
Retrieval performance (%) for USPS.Methods Precision@50 Recall@50Euclidean Distance 45.53 56.91SD [12] 47.42 59.27PPR [24] 47.39 59.24Manifold Ranking [15] 47.42 59.28GT [14] 46.18 57.72RAN (ours) k −− number of neighbors P r e c i s i on @ R e c a ll @ Fig. 3
Retrieval Performance (%) v.s. the number of neighborson USPS.
We study the data ranking problem by capturing theunderlying geometry of the data manifold. Insteadof relying on the fixed-weight data graphs, wepropose a new ranking algorithm that is able tolearn the data affinity matrix and the ranking scoressimultaneously. The proposed optimization formulationassigns adaptive neighbors to each data point based onthe local connectivity and the smoothness constraintassigns similar ranking scores to similar data points.An efficient algorithm is developed to solve theoptimization problem. Evaluations using synthetic andreal datasets demonstrates the superior performance ofthe proposed algorithm.
References [1] Lawrence Page, Sergey Brin, Rajeev Motwani, and TerryWinograd. The pagerank citation ranking: Bringing orderto the web. Technical report, Stanford InfoLab, 1999.[2] Jingrui He, Mingjing Li, Hong-Jiang Zhang, HanghangTong, and Changshui Zhang. Manifold-ranking basedimage retrieval. In
Proceedings of the 12th annualACM international conference on Multimedia , pages 9–16.ACM, 2004.[3] Hanghang Tong, Jingrui He, Mingjing Li, Wei-Ying Ma,Hong-Jiang Zhang, and Changshui Zhang. Manifold-ranking-based keyword propagation for image retrieval.
EURASIP Journal on Advances in Signal Processing ,2006(1):079412, 2006.[4] Song Bai, Xiang Bai, Qi Tian, and Longin Jan Latecki.Regularized diffusion process for visual retrieval. In
AAAI ,pages 3967–3973, 2017.[5] Michael Donoser and Horst Bischof. Diffusion processesfor retrieval revisited. In
Proceedings of the IEEE
Tsinghua Science and Technology, December
Conference on Computer Vision and Pattern Recognition ,pages 1320–1327, 2013.[6] Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, TeddyFuron, and Ondrej Chum. Efficient diffusion on regionmanifolds: Recovering small objects with compact cnnrepresentations. In
CVPR , 2017.[7] Yi Yang, Dong Xu, Feiping Nie, Jiebo Luo, and YuetingZhuang. Ranking with local regression and globalalignment for cross media retrieval. In
Proceedings of the17th ACM international conference on Multimedia , pages175–184. ACM, 2009.[8] Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang,and Hsiao-Wuen Hon. Adapting ranking svm to documentretrieval. In
Proceedings of the 29th Annual InternationalACM SIGIR Conference on Research and Development inInformation Retrieval , SIGIR ’06, pages 186–193. ACM,2006.[9] Sam T Roweis and Lawrence K Saul. Nonlineardimensionality reduction by locally linear embedding. science , 290(5500):2323–2326, 2000.[10] Joshua B Tenenbaum, Vin De Silva, and John CLangford. A global geometric framework for nonlineardimensionality reduction. science , 290(5500):2319–2323,2000.[11] Hanghang Tong, Christos Faloutsos, and Jia-yu Pan. Fastrandom walk with restart and its applications. In
ICDM ,pages 613–622. IEEE, 2006.[12] Bo Wang and Zhuowen Tu. Affinity learning via self-diffusion for image segmentation and clustering. In
Computer Vision and Pattern Recognition (CVPR), 2012IEEE Conference on , pages 2312–2319. IEEE, 2012.[13] Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty.Semi-supervised learning using gaussian fields andharmonic functions. In
ICML , 2003.[14] Xiang Bai, Xingwei Yang, Longin Jan Latecki, WenyuLiu, and Zhuowen Tu. Learning context-sensitive shapesimilarity by graph transduction.
IEEE Transactions onPattern Analysis and Machine Intelligence , 32(5):861–874, 2010. [15] Dengyong Zhou, Jason Weston, Arthur Gretton, OlivierBousquet, and Bernhard Sch¨olkopf. Ranking on datamanifolds. In
NIPS , pages 169–176, 2003.[16] Martin Szummer and Tommi Jaakkola. Partially labeledclassification with markov random walks. In
NIPS ,NIPS’01, pages 945–952, 2001.[17] X. Yang, S. Koknar-Tezel, and L. J. Latecki. Locallyconstrained diffusion process on locally densified distancespaces with applications to shape retrieval. In ,pages 357–364, 2009.[18] Xingwei Yang, Lakshman Prasad, and Longin Jan Latecki.Affinity learning with diffusion on tensor product graph.
IEEE transactions on pattern analysis and machineintelligence , 35(1):28–38, 2013.[19] Jun Wang, Tony Jebara, and Shih-Fu Chang. Graphtransduction via alternating minimization. In
Proceedingsof the 25th international conference on Machine learning ,pages 1144–1151. ACM, 2008.[20] Feiping Nie, Xiaoqian Wang, and Heng Huang. Clusteringand projected clustering with adaptive neighbors. In
KDD ,KDD ’14, pages 977–986, 2014.[21] Athinodoros S. Georghiades, Peter N. Belhumeur, andDavid J. Kriegman. From few to many: Illumination conemodels for face recognition under variable lighting andpose.
IEEE transactions on pattern analysis and machineintelligence , 23(6):643–660, 2001.[22] Ferdinando S Samaria and Andy C Harter.Parameterisation of a stochastic model for humanface identification. In
Proceedings of the Second IEEEWorkshop on Applications of Computer Vision , pages138–142. IEEE, 1994.[23] Jonathan J. Hull. A database for handwritten textrecognition research.
IEEE Transactions on patternanalysis and machine intelligence , 16(5):550–554, 1994.[24] Taher H Haveliwala. Topic-sensitive pagerank. In
Proceedings of the 11th international conference on WorldWide Web , pages 517–526. ACM, 2002.
Feiping Nie received the Ph.D. degreein Computer Science from TsinghuaUniversity, China in 2009, and currentlyis full professor in NorthwesternPolytechnical University, China. Hisresearch interests are machine learningand its applications, such as patternrecognition, data mining, computervision, image processing and information retrieval. He haspublished more than 100 papers in the following top journalsand conferences: TPAMI, IJCV, TIP, TNNLS/TNN, TKDE,Bioinformatics, ICML, NIPS, KDD, IJCAI, AAAI, ICCV,CVPR, ACM MM. His papers have been cited more than 7000times and the H-index is 48. He is now serving as AssociateEditor or PC member for several prestigious journals and conferences in the related fields.