Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging
SSUBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 1
Social Anchor-Unit Graph Regularized Tensor Completion forLarge-Scale Image Retagging
Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, and Qi Tian,
Fellow, IEEE
Abstract —Image retagging aims to improve the tag quality of social im-ages by completing the missing tags, recrifying the noise-corrupted tags,and assigning new high-quality tags. Recent approaches simultaneouslyexplore visual, user and tag information to improve the performanceof image retagging by mining the tag-image-user associations. However,such methods will become computationally infeasible with the rapidlyincreasing number of images, tags and users. It has been proven thatthe anchor graph can significantly accelerate large-scale graph-basedlearning by exploring only a small number of anchor points. Inspired bythis, we propose a novel Social anchor-Unit GrAph Regularized TensorCompletion (SUGAR-TC) method to efficiently refine the tags of socialimages, which is insensitive to the scale of data. First, we construct ananchor-unit graph across multiple domains ( e.g. , image and user domains)rather than traditional anchor graph in a single domain. Second, a tensorcompletion based on Social anchor-Unit GrAph Regularization (SUGAR)is implemented to refine the tags of the anchor images. Finally, weefficiently assign tags to non-anchor images by leveraging the relationshipbetween the non-anchor units and the anchor units. Experimental resultson a real-world social image database well demonstrate the effectivenessand efficiency of SUGAR-TC, outperforming the state-of-the-art methods.
Index Terms —Image retagging, anchor graph, tensor completion,image retrieval.
I. I
NTRODUCTION
In the past decades, various social image retagging methods [1]–[6]have been proposed to improve the tag quality of social images. Atthe early stage, image retagging methods refine tags of social imagesby utilizing semantic correlation among tags and visual similarityamong images [7], [8]. A basic assumption of such methods is thattwo images with high visual similarity should have similar semantictags, and vice versa. To well leverage the available image-tag inter-association, some researchers improved the matrix completion basedon the low-rank decomposition to simultaneously recover the missingtags, and remove the noisy tags [9], [10]. Candes and Plan [11]have proven that the matrix completion model enables to completethe missing entries from a small number of observed entries in theoriginal matrix between the dyadic data. However, the aforemen-tioned methods achieve unsatisfied retagging results when there existsdisambiguation between the visual content and label taxonomy ( e.g. ,WordNet taxonomy [12]).Intuitively, a set of images uploaded by one common user tendsto have close relations. That is to say, the user information ( e.g. ,user interests and backgrounds) bridges the inter-relationship betweentags and social images [13], especially for some certain tags, e.g. ,geo-related tags, event tags, etc.
Therefore, several methods wereproposed to simultaneously leverage the visual information, taginformation, and user information for social image retagging. Sang etal. [2] attempted to model the inter-association among users, images
J. Tang, X. Shu, and Z. Li are with the School of Computer Science andEngineering, Nanjing University of Science and Technology, Nanjing 210094,China. E-mail: { jinhuitang, shuxb, zechao.li } @njust.edu.cn. (Correspondingauthor: Xiangbo Shu)Y.-G. Jiang is with the School of Computer Science, Fudan University,Shanghai 201203, China. E-mail: [email protected]. Tian is with the Department of Computer Science, University of Texasat San Antonio, San Antonio, TX 78249-1604, USA. E-mail: [email protected] Refine tagson the whole dataset
Refine tags on small anchor-unit set Anchor units with images and users An anchor unit (a) Traditional methods [4], [5]
Refine tagson the whole dataset
Refine tags on small anchor-unit set Anchor units with images and users An anchor unit (b) The proposed SUGAR-TCFig. 1. Conventional methods refine tags of images by exploring the inter-association among all data. This comes at a price of increased computationalcost when faced with a larger number of images. The proposed method refinestag on a smaller-scale anchor-unit set, and then assigns tags to all non-anchorimages in an efficient way, regardless of the data scale. and tags, and presented a tensor completion based on the low-rankdecomposition to refine the social image tags. However, since theprocess of tensor completion generates many large-scale temporarymatrices and tensors, it requires extremely large computational cost.To handle this problem, Tang et al. [4] proposed Tri-clusteredTensor Completion (TTC) to divide the original tag-image-user tensoramong all tags, images, and users into several sub-tensors, andimplement the sub-tensor completion in a parallel way. However,the computational cost still increases accompanied with the numberof images increasing.To this end, we aim to develop an efficient image retaggingframework to assign high-quality tags to social images, regardlessof the data scale. We investigate that anchor graph can significantlyaccelerate large-scale graph-based learning by only exploring a smallnumber of anchor points on the graph [14], [15]. Inspired by this,we propose a Social anchor-Unit GrAph Regularization (SUGAR)model on a tag-image-user graph by exploring a small number ofrepresentative anchor units, where one anchor unit consists of ananchor image and an anchor user, as shown in Figure 1. Accordingly,we propose a novel Social anchor-Unit GrAph Regularized TensorCompletion (SUGAR-TC) method to efficiently refine the tags ofsocial images. The framework of SUGAR-TC is shown in Figure2, which mainly consists of three modules. (a) Anchor-unit graphconstruction module. We employ the co-clustering algorithm to obtainthe representative anchor units in multiple domains ( i.e. , image anduser domains) rather than the traditional anchors in a single domain,and then construct a social anchor-unit graph. (b) SUGAR tensor a r X i v : . [ c s . C V ] O c t UBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 2
Social anchor-unit set
Image inter-adjacency matrix User inter-adjacency matrix
Tag intra- adjacency matrixUser intra-adjacency matrixImage intra-adjacency matrix | | n | | n | | n | || | | | m | | n | | n | | m | | n Image set User set | | | | m | | n | | n | | m | | T m I m U | || | n | | n | | | | m | | m m m T I U
Image set T m I m U Tag set + SUGAR
Anchor tensorAnchor tensor
Non-anchor tensor Relationship between non-anchor units and anchor units
Non-anchor Image
SUGAR tensor completionSocial anchor-unit graph construction Anchor-aware tag assignment
Refine tags of anchor images Assign tags to non-anchor imagesObtain anchor units Construct anchor-unit graph horse landscapesunsethaze treeRefined tags: I W U W Fig. 2. The whole framework of SUGAR-TC for social image retagging. (a) Social anchor-unit graph construction: obtain anchor units and construct anchor-unit graph with the inter- and intra- adjacency edges. (b) SUGAR tensor completion: refine tags of anchor images. (c) Anchor-aware tag assignment: assigntags to non-anchor images via the inter-association between non-anchor units and anchor units. completion module. We propose a tensor completion model withSocial anchor-Unit GrAph Regularization (SUGAR). It completes theanchor tensor by decomposing the non-anchor tensor to refine thetags of anchor images. (c) Anchor-aware tag assignment module. Byleveraging the potential relationships between non-anchor units andanchor units, we use the weighted average tags of anchor images toefficiently assign high-quality tags to the non-anchor images.Overall, the main contributions of this work are two-fold. (1) Wepropose an efficient Social anchor-Unit GrAph Regularized TensorCompletion (SUGAR-TC) method for social image retagging, evenwhen the data scale is very large. (2) To the best of our knowledge, itis the first time that social anchor unit across multiple domains ratherthan the traditional anchors in a single domain is presented. In theexperiments, the proposed SUGAR-TC gains superior performanceon both effectiveness and efficiency compared with the state-of-the-art methods. II. R
ELATED W ORK
A. Social Image Retagging
The goal of social image retagging is to improve tag qualityby recovering missing tags and removing noisy tags [16], [17]. Itis an essential step for large-scale tag-based image retrieval. Theimage retagging task is closely related to tag refinement [4], [18],tag completion [3]. Overall, social image retagging methods havegone through two stages, i.e. , image-tag association based imageretagging [1], [3], [7], [8], [19]–[25], and tag-image-user associationbased image retagging [2], [4], [5], [26].The image-tag association based methods focus on exploring inter-and intra- associations among images and tags. As one of the mostclassic works, Liu et al. [7] assumed that two images with highervisual similarity are more likely to have similar tags, or common tags.Yang et al. [8] proposed to mine the multi-tag intra-association fortag expansion and denoising. Chen et al. [20] proposed co-regularizedlearning with two classifiers by jointly mapping visual features andtext features into a common subspace.Matrix completion based on the low-rank decomposition is popu-larly applied for image retagging. As an early study, Zhu et al. [1]solved the image retagging problem by factorizing the image-taginter-association matrix into an approximately low-rank completedmatrix and a sparse error matrix, where this low-rank completedmatrix reveals the image-tag associations. Besides, Feng et al. [27]theoretically analyzed that the matrix completion method is able tosimultaneously recover missing tags and remove noisy tags even with a limited number of observations. Motivated by the success of low-rank model in image tagging, Xu et al. [9] proposed Non-linearMatrix Completion (NMC) for social image tagging by constructingthe original image-tag matrix with a non-linear kernel mapping.Li et al. [10] divided the original image-tag matrix into severalsub-matrices, and proposed a Locality Sensitive Low-Rank (LSLR)model on each sub-matrix to recover the image-tag matrix via matrixfactorization.Tag-image-user association based image retagging methods con-struct a tag-image-user inter-association tensor instead of the image-tag inter-association matrix to explore the important user informa-tion [2]. It can further improve the performance of image retaggingwith the help of user information. Tang et al. [4] proposed a Tri-clustered Tensor Completion (TTC) framework to first divide theoriginal super-sparse tensor into several sub-tensors, and completeall sub-tensors regularized by a tensor kernel. However, to a certaindegree, TTC will break the tag-image-user inter-association whendividing the original super-sparse tensor into several sub-tensors.
B. Anchor Graph-based Learning
Graph-based learning methods have achieved impressive perfor-mance in various applications [28]. However, such methods requiremuch computational cost when the size of data rapidly increases.Therefore, several strategies have been proposed to reduce the com-putational cost of graph-based learning. To sum up, these strategiescan be summarized into three categories. The first strategy usesneighborhood propagation from the approximate neighborhood graphto current graph [29]–[32]. The second strategy utilizes hashing toimprove the performance in terms of efficiency [33]–[36]. The laststrategy employs the anchor graph to simultaneously reduce thecomputational cost and memory cost [15], [37]–[40]. On an anchorgraph, anchors covering vast data point cloud can predict the labelfor each data point, even when the data size is large. As a result,the anchor graph model has been successfully applied to solve manypractical tasks, including image retrieval [41], face recognition [42],image classification [15], object tracking [43], and so on. In thiswork, we extend the anchor graph in a single domain to the anchor-unit graph across multiple domains, to simultaneously explore theassociation information among different domains.III. T HE P ROPOSED F RAMEWORK
This section introduces the whole framework of SUGAR-TC, asshown in Figure 2. It includes three main modules, i.e. , Social anchor-unit graph construction module, SUGAR tensor completion module,
UBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 3 and Anchor-aware tag assignment module. In this paper, tensors,matrices, vectors, variables and sets are denoted by calligraphicuppercase letters ( e.g. , A ), bold uppercase letters ( e.g. , T ), boldlowercase letters ( e.g. , d ), letters ( e.g. , m, I) and blackboard boldletters ( e.g. , I ), respectively. T i,j denotes the i -th row and j -thcolumn entry of T . T i,j,k denotes the i -th row, j -th column and k -th tube entry of T . | I | denotes the number of data in I . A [1] , A [2] and A [3] are the matrix form of A by accumulating the entries of A along the row, column, and tube axes, respectively. For convenience,some important notations are defined in Table I. A. Background and Problem Definition
In this work, three types of data are collected from photo sharingwebsites, namely the image set I = { x i } | I | i =1 , the tag set T = { t j } | T | j =1 and the user set U = { u k } | U | k =1 , where x i , t j and u k denote the i -th image, the j -th tag, and the k -th user, respectively. Since manytags of images freely given by users with various backgrounds andinterests are ambiguous, noisy and incomplete, the original user-provided tags are weakly-supervised, which cannot represent thesemantics of images well. If we directly implement the tag-basedimage retrieval on social images with original tags, the performanceof retrieval will degrade. In this work, we aim to refine the tags ofsocial images by utilizing the associations among tags, images andusers. However, the numbers of images, tags and users are large, andeven huge on photo sharing websites. This will dramatically increasethe computational cost.It has been proven that the anchor graph can significantly acceleratelarge-scale graph-based learning by exploring only a small numberof anchor points. However, traditional anchor graph is constructedonly in image domain. To leverage the user information, we extendthe traditional anchor to anchor unit, which is a tuple including ananchor image and an anchor user. And the graph constructed on theanchor units is called anchor-unit graph.Let I n , U n , I m , and U m indicate the non-anchor image set, thenon-anchor user set, the anchor image set, and the anchor user setrespectively, where | I n | + | I m | = | I | and | U n | + | U m | = | U | , weconstruct a tag-image-user non-anchor tensor T ∈ R | T |×| I n |×| U n | among all the tags, non-anchor images and non-anchor users, anda tag-image-user anchor tensor A ∈ R | T |×| I m |×| U m | among all thetags, anchor images and anchor users, in which some associationsare incorrect or missing. Let A ∈ R | T |×| I m |×| U m | denote the refinedanchor tensor among the refined tags, anchor images and anchorusers, this work first aims to learn A with the help of the inter-association in T and A . After that, benefiting from the tag-image-user association in A , we can efficiently retag the non-anchor imagesbased on the inter-relationship between non-anchor units and anchorunits on the anchor-unit graph. B. Social Anchor-Unit Graph Construction
To construct the anchor-unit graph, we need to obtain the anchorunits from the image and user data. Generally speaking, the perfectanchor units should satisfy two conditions: 1) they can adequatelyrepresent the distribution of image and use data; 2) the numberof the anchor units should be much smaller than the number ofimages [37]. Existing methods employ the K-means clustering toobtain the traditional anchors [35], [38]. However, K-means algo-rithm performing in the single domain cannot cluster the imageand user data simultaneously. Fortunately, co-clustering algorithmcan construct the association between two types of data by anassociated matrix, and then cluster rows and columns of this matrixsimultaneously into several co-clusters [44]. Therefore, we can buildan associated image-user matrix D [1] ∈ R | I |×| U | by accumulating TABLE IN
OTATIONS AND DEFINITIONS . Notation Definition I Image set. U User set. T Tag set. I n Non-anchor image set. U n Non-anchor user set. I m Anchor image set. U m Anchor user set. D Original tag-image-user tensor on T , I and U . T Original tag-image-user non-anchor tensor on T , I n and U n . A Original tag-image-user anchor tensor on T , I m and U m . A Refined tag-image-user anchor tensor. ( to be learned ) I m Inter-adjacency matrix between non-anchor image and anchor image. U m Inter-adjacency matrix between non-anchor user and anchor user. W I Intra-adjacency matrix between non-anchor image and non-anchor image. W U Intra-adjacency matrix between non-anchor user and non-anchor user. T Intra-adjacency matrix between tag and tag. entries of tag-image-user tensor D along the tag axis, and adopt theco-clustering algorithm instead of the K-means algorithm to find C co-cluster centers. Subsequently, we select m c image-user units (auser uploads this image) that is most close to the c -th co-cluster centeras the anchor unit. In total, we can obtain m image-user anchor unitsfor C co-cluster centers, where m = m c × C .Based on the m anchor units, we construct a social anchor-unitgraph Ω { I n , U n , I m , U m , T , ω } , where ω indicates the collection ofthe adjacency edges between the data points. To measure the weightof each edge in anchor-unit graph Ω , we design two types of inter-adjacency matrices between non-anchor data and anchor data ( i.e. ,image inter-adjacency matrix and user inter-adjacency matrix), andthree types of intra-adjacency matrices among non-anchor images,non-anchor users and tags ( i.e. , image intra-adjacency matrix, userintra-adjacency matrix, and tag intra-adjacency matrix). Inter-adjacency matrices.
First, we design the image inter-adjacency matrix I m ∈ R | I n |×| I m | between | I n | non-anchor imagesand | I m | anchor images, i.e. , I mi,j = exp (cid:32) − || d x i − d x j || σ (cid:33) , (1)where σ is a parameter of the RBF kernel, d x i and d x j indicatefeatures ( i.e. , CNN feature [45]) of a non-anchor image x i and ananchor image x j , respectively. Here, if I mi,j < − , we set I mi,j = 0 to obtain a sparse matrix I m for reducing the memory cost.Second, we also design user inter-adjacency matrix U m ∈ R | U n |×| U m | between the | U n | non-anchor users and | U m | anchorusers. Similar to [4], we assume that two users with higher co-occurrence are more likely to be related with each other, and viceversa, namely U mi,j = N ( u i , u j ) N ( u i ) + N ( u j ) − N ( u i , u j ) , (2)where N ( u i , u j ) denotes the number of groups that both a non-anchor user u i and an anchor user u j join, and N ( u i ) is the numberof groups that a non-anchor user u i joins, N ( u j ) is the number ofgroups that an anchor user u j joins. Intra-adjacency matrices.
We first design image intra-adjacencymatrix W I ∈ R | I n |×| I n | among the non-anchor images. One strategyis that we can measure the association between two non-anchorimages by Eq. (1). However, the number of non-anchor images ismuch larger than the number of anchor images, which will increasethe computational cost. Alternately, since anchor images are veryrepresentative in the whole image set, we can measure the association UBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 4 between non-anchor images by exploring links among commonanchor images, which has also been claimed in [14], [15]. For twonon-anchor images, the more number of common anchor images thetwo non-anchor images share, the stronger association such two non-anchor images have. Thus, W I ∈ R | I n |×| I n | can be computed asfollows, W I = I m (cid:16) Λ I (cid:17) − ( I m ) T , (3)where the diagonal matrix Λ I is defined as Λ Ij,j = | I n | (cid:80) i =1 I mi,j ( (cid:54) j (cid:54) | I m | ). From Eq (3), if W Ii,j > , it means two non-anchor imagesshare at least one anchor user, and otherwise W Ii,j = 0 .Second, we use the same idea to design user intra-adjacency matrix W U ∈ R | U n |×| U n | among the non-anchor users, as follows, W U = U m (cid:16) Λ U (cid:17) − ( U m ) T , (4)where the diagonal matrix Λ U is defined as Λ Uj,j = | U n | (cid:80) i =1 U mi,j .Following the definitions in [4], [5], we design the tag intra-adjacency matrix T ∈ R | T |×| T | by using the categorical relationsand the co-occurrence, i.e. T i,j = a N ( t i , t j ) N ( t i ) + N ( t j ) − N ( t i , t j ) + a · C ( L ( t i , t j )) C ( t i ) + C ( t j ) , (5)where a and a denote the weight coefficients ( a + a = 1 ); N ( t i ) denotes the occurrence count of tag t i in a dataset; N ( t i , t j ) denotes the co-occurrence count for tags t i and t j in a dataset; C ( t i ) = − log ( p ( t i )) is the information content of tag t i ; p ( t i ) is the probability of tag t i ; and L ( t i , t j ) is the least common sub-sumer of tags t i and t j in the WordNet taxonomy [12]. The leastcommon sub-sumer of two tags in WordNet is the sumer that doesnot have any children that are also the sub-sumer of two tags. C. SUGAR Tensor Completion
It has been proved that anchor graph can efficiently deal withthe standard multi-class semi-supervised learning problem [37].Motivated by this, we extend the existing Anchor Graph Regular-ization [38] to a novel Social anchor-Unit GrAph Regularization(SUGAR). Specifically, it can be assumed that two images linked bythe same anchor image should have closely similar tags and closelyrelated users, two users linked by the same anchor user prefer toupload images with closely similar tags. Then, we can define theregularization as follows, Θ = λ | I n | (cid:88) i =1 ,j =1 W Ii,j ||A × I mi, : − A × I mj, : || F + λ | U n | (cid:88) i =1 ,j =1 W Ui,j ||A × U mi, : − A × U mj, : || F , (6)where λ and λ are nonnegative coefficients to control the penaltyof the corresponding regularization. The matrix A× I mi, : denotes the -mode product of A and I mi, : , and describes the inter-associationbetween the tags and the anchor users corresponding to the i -th non-anchor image. The matrix A× U mi, : describes the inter-associationbetween the tags and the anchor images corresponding to the i -thnon-anchor user.The great number of original tags provided by users on websitesinvolve the important supervised information, which can guide usto mine the inter-association among images, tags, and users [2].Accordingly, we construct the tag-image-user non-anchor tensor More details about n -mode product can be found in [46]. Algorithm 1
SUGAR Tensor Completion
Input:
Original non-anchor tensor T , original anchor tensor A on theanchor-unit set, adjacency matrices T , I m , U m , W I and W U ,parameters α , β , λ and λ . Output:
Completed anchor tensor A . Initialization: A . repeat Update A by Eq. (12). until Convergence. return A . T ∈ R | T |×| I n |×| U n | among tags, non-anchor images and non-anchorusers. Specifically, if the i -th non-anchor image uploaded by the k -thnon-anchor user is annotated with the j -th tag, we set T i,j,k = 1 ,otherwise T i,j,k = 0 , where (cid:54) i (cid:54) | T | , (cid:54) j (cid:54) | I n | , and (cid:54) k (cid:54) | U n | . For a tensor with a few available entries, the tensorcompletion algorithm can estimate missing entries and remove thenoisy ones by reconstructing an approximately low-rank tensor ˜ T , asfollows, min ˜ T ||T − ˜ T || F . (7)The Tucker decomposition [46] of tensor provides a factorizationway to solve the low-rank tensor ˜ T by the following objectivefunction, i.e. , min ˜ T ||T − S× B × C × D || F , (8)where ˜ T = S× B × C × D . Here, S denotes the core tensor, B , C ,and D denote the factor matrices. Since T , I m and U m describe theassociations between tags and tags, non-anchor images and anchorimages, as well as non-anchor users and anchor users, respectively,by setting B = T , C = I m and D = U m , the learned core tensor S can be regarded as the refined tensor A . To well leverage theweakly supervised tag-image-user association, we introduce a newregularization term (cid:107)A − A (cid:107) F , which constrains the learned A to beconsistent with the original tag-image-user anchor tensor A . Finally,we can obtain the refined tensor A by minimizing the following tensorcompletion function Θ , i.e. , Θ = (cid:107)T − A× T × I m × U m (cid:107) F + α (cid:107)A − A (cid:107) F + β (cid:107)A(cid:107) F , (9)where α and β are nonnegative parameters to control the penaltyof the regularization. In Eq. 9, the learned A is a low-rank andcompact tensor, which can reveal the association among anchor-images, anchor-users and tags.We integrate social anchor-graph regularization ( i.e. , Eq. (6)) intoEq. 9, and then obtain the completed anchor tensor A by minimizingan objective function Θ = Θ + Θ , as follows, min A Θ= min A (cid:107)T − A× T × I m × U m (cid:107) F + α (cid:107)A − A (cid:107) F + β (cid:107)A(cid:107) F + λ | I n | (cid:88) i =1 ,j =1 W Ii,j ||A × I mi, : − A × I mj, : || F + λ | U n | (cid:88) i =1 ,j =1 W Ui,j ||A × U mi, : − A × U mj, : || F . (10) UBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 5
In this work, we learn A in an iteratively updating way. Specifi-cally, we compute the partial derivatives of the objective function Θ with respect to A , i.e. , ∂ Θ ∂ A =2 A× (cid:16) T T T (cid:17) × (cid:16) ( I m ) T I m (cid:17) × (cid:16) ( U m ) T U m (cid:17) − T × T T × ( I m ) T × ( U m ) T + 2 α ( A − A ) + 2 β A + λ | I n | (cid:88) i =1 ,j =1 W Ii,j (cid:16) A× (cid:16) a iT a i (cid:17) − A× (cid:16) a iT a j (cid:17)(cid:17) + λ | I n | (cid:88) i =1 ,j =1 W Ii,j (cid:16) A× (cid:16) a jT a j (cid:17) − A× (cid:16) a jT a i (cid:17)(cid:17) + λ | U n | (cid:88) i =1 ,j =1 W Ui,j (cid:16) A× (cid:16) b iT b i (cid:17) − A× (cid:16) b iT b j (cid:17)(cid:17) + λ | U n | (cid:88) i =1 ,j =1 W Ui,j (cid:16) A× (cid:16) b jT b j (cid:17) − A× (cid:16) b jT b i (cid:17)(cid:17) =2 A× (cid:16) T T T (cid:17) × (cid:16) ( I m ) T I m (cid:17) × (cid:16) ( U m ) T U m (cid:17) − T × T T × ( I m ) T × ( U m ) T + 2 α ( A − A ) + 2 β A + 2 λ | I n | (cid:88) i =1 ,j =1 W Ii,j (cid:16) A× (cid:16) a iT a i (cid:17) − A× (cid:16) a iT a j (cid:17)(cid:17) + 2 λ | U n | (cid:88) i =1 ,j =1 W Ui,j (cid:16) A× (cid:16) b iT b i (cid:17) − A× (cid:16) b iT b j (cid:17)(cid:17) , (11)where a i = I mi, : , a j = I mj, : , b i = U mi, : , and b j = U mj, : . Therefore,the multiplicative updating rule [47] of A is A i,j,k = A i,j,k ( H + α A + λ Q + λ P ) i,j,k ( G + ( α + β ) A + λ U + λ V ) i,j,k , (12)where H = T × T T × ( I m ) T × ( U m ) T , G = A× (cid:16) T T T (cid:17) × (cid:16) ( I m ) T I m (cid:17) × (cid:16) ( U m ) T U m (cid:17) , Q = | I n | (cid:88) i =1 ,j =1 W Ii,j (cid:16) A× (cid:16) a iT a j (cid:17)(cid:17) , P = | U n | (cid:88) i =1 ,j =1 W Ui,j (cid:16) A× (cid:16) b iT b j (cid:17)(cid:17) ,U = | I n | (cid:88) i =1 ,j =1 W Ii,j (cid:16) A× (cid:16) a iT a i (cid:17)(cid:17) , V = | U n | (cid:88) i =1 ,j =1 W Ui,j (cid:16) A× (cid:16) b iT b i (cid:17)(cid:17) . (13)The details of algorithm are described in Algorithm 1. Here, A isinitialized by A + E , where E is a random small-disturbed tensorwith mean 0. The convergence criterion is that the iterations stopwhen the relative cost of the objective function is smaller than apredefined threshold − . The proposed SUGAR-TC convergencesafter about iterations in the experiments. D. Anchor-Aware Tag Assignment
After obtaining the completed tensor A , we accumulate its entriesalong the user axis and image axis to acquire the desired tag-image association matrix A [3] and tag-user association matrix A [2] ,respectively. Then, we can employ the completed tags associatedwith anchor units to predict tags for non-anchor images. For one non-anchor image x i ( i = 1 , , · · · , | I n | ) with an available user u k ( k = 1 , , · · · , | U n | ), we use the weighted average of tags of s nearest-neighbor anchor units to estimate its tag vector y i , as follows, y i = γ I mi, (cid:104)(cid:0) A [3] (cid:1) : , (cid:105) T +(1 − γ ) U mk,
A. Experimental Settings
We conduct experiments on a real-world social image datasetNUS-WIDE-128 [48] to evaluate the performance of the proposedSUGAR-TC method. It is extended from the widely-used NUS-WIDEdataset [49], and contains , images, , user-provides tagscrawled from the Internet, as well as the manually labeled ground-truth of 128 predefined concepts for evaluation. In this dataset, someuser IDs are invalid or unavailable, thus we delete the correspondingimages. Finally, we obtain , images with , user IDsand the information of user groups.In the experiments, we compare the proposed method withTRVSC [7], LR [1], LSLR [10], NMC [9], MRTF [2], TTC1 [4] andTTC2 [4]. Besides, we also set a baseline method called OriginalTagging (OT), where the labels used for evaluation are the user-provided tags crawled from Internet. To evaluate the performance,the widely-used F-score is adopted. For each concept, F-score iscalculated as F-score = × Precision × RecallPrecision + Recall . And then the average F-score over all the concepts is reported. For fair comparison, we usethe CNN features [45] and tune the hyper-parameters by using thegrid search strategy for all methods in the experiments. The bestresults are reported for comparison. All methods in the experimentsare implemented on a server with an 8-core 2.67 GHz CPU and 32GB memory.
B. Results and Analysis
In this section, we conduct experiments to evaluate the effective-ness of the proposed SUGAR-TC. The compared results in termsof the average F-score are presented in Table II. We find that theproposed SUGAR-TC achieves the highest F-score compared with theother related methods. MRTF, TTC1 and TTC2 utilize the inter- andintra- associations among images, tags and users, thus they performbetter than TRVSC, LR, NMC and LSLR. In particular, since TTC1and TTC2 address the super-sparse problem existing in the originaltensor, they gain higher F-scores than MRTF that directly completesthe entries in the original tensor. However, to a certain degree, bothTTC1 and TTC2 break the latent inter-association among images, tagsand users, when dividing the original tensor into several sub-tensors.In turn, the proposed SUGAR-TC did not break the inherent tag-image-user inter-association. Thus, the F-score obtained by SUGAR-TC is . , which is about . higher than . of TTC2.We further detail the F-scores obtained by different methods onall the 128 predefined concepts, as shown in Figure 3. We can seethat SUGAR-TC achieves the highest F-scores on most of theseconcepts. We can also see that all image retagging methods improvethe quality of almost all concepts compared with OT. By adding userinformation, MRTF, TTC1, TTC2 and SUGAR-TC perform betterthan LR, NMC, LSLR and TRVSC, especially for some summarized UBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 6
TABLE IIA
VERAGE F- SCORES OF DIFFERENT METHODS FOR IMAGE RETAGGING .Methods OT TRVSC [7] LR [1] NMC [9] LSLR [10] MRTF [2] TTC1 [4] TTC2 [4] SUGAR-TCAverage F-scores 0.379 0.390 0.410 0.417 0.435 0.423 0.447 0.458 0.485 F - s c o r e OT TRVSC LR NMC LSLR MRTF TTC1 TTC2 SUGAR-TC F - s c o r e F - s c o r e F - s c o r e Fig. 3. The comparisons of detailed F-scores of different methods. For those concepts denoted with the black bounding box, the proposed SUGAR-TC showsremarkable improvements. Best view in color. or complex tags ( e.g. , “military”, “nighttime”, and “cityscape”). Inparticular, for the frequently-used tags ( e.g. , “bus”, “flowers”, and“sunset”, etc) in Figure 3, SUGAR-TC shows remarkable improve-ment than TTC1 and TTC2, one reason may be that TTC1 andTTC2 break the tag-image-user inter-association for such commontags when decomposing the original tensor into several sub-tensors.We also show some tag assignment results for some specificcases, as shown in Figure 4. We can see that SUGAR-TC caneffectively assign tags of images, even though there are some globallycomplex images ( e.g. , Figure 4(c)) and locally abstract images ( e.g. ,Figure 4(d)), since SUGAR-TC can simultaneously mine the contextinformation between users and images, and the semantic relation between images and tags. Besides, although some tags, i.e. , geo-related tags, event tags, and time-related tags, are hardly inferred byonly using visual information, SUGAR-TC considering extra userinformation can correctly infer them. For example, as shown inFigure 4(a), the train is a landmark of Japan, thus this image should betagged with the geo-related tags “Japan”. It is very difficult to revealthe relation between the tag “Japan” and this image by only miningthe visual and tag information, but SUGAR-TC can do it well basedon the user background information. Similarly in Figure 4(b), thisimage shows the Obamas speaking of the United States presidentialelection, it is reasonable to assign the event tag “election” to thisimage.
UBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 7 orange plant cute animal
Original tag japan trainsubwaypersonmorning
Refined tag winter usa america hope democracy
Original tag animal
Original tag birthday food cake
Original tag wet rain weatherlines
Original tag (a) (b)(c) (d)(e) (f) city morning travel red people
Original tag america vote democracy political election
Refined tag pandasunset romancedusk sun
Refined tag monkeycute animal zoo plant
Refined tag fogbuilding skyweather cloud
Refined tag birthday cake food handmade beautiful
Refined tag
Fig. 4. Some results of image retagging obtained by SUGAR-TC.TABLE IIID
ETAILED COMPARISONS OF COMPUTATIONAL TIME ( HOUR ) ANDMEMORY (GB) ON NUS-WIDE-128. “ / ”
DENOTES “ TIME / MEMORY ”.Method MRTF [2] TTC1 [4] TTC2 [4] SUGAR-TCClustering - 4.3/8.6 4.3/8.6 2.1/5.5Adjacency matrix 1.6/1.8 1.6/1.8 1.6/1.8 0.7/0.07Tensor completion 12.3/10.3 5.7/6.6 29.6/8.2 1.2/10.1Tag assignment - - - 0.2/0.08Total time 13.9 11.6 35.5 4.2
C. Computational Cost Analysis
To illustrate the efficiency of the proposed SUGAR-TC, we alsocompare the computational cost of SUGAR-TC and the closelyrelated methods, including MRTF [2], TTC1 [4], and TTC2 [4].Table III lists the computing time and memory cost of these methodsin details. We can see that: 1) SUGAR-TC with total computing time4.2h runs much faster than MRTF, TTC1 and TT2; 2) the memorycost of SUGAR-TC is more than TTC1 and TTC2, but less thanMRTF. By jointly considering the accuracy, computing time andmemory cost, SUGAR-TC is more applicable for image retagging.
D. Parameter Sensitivity Analysis
In the proposed SUGAR-TC method, there are several hyper-parameters to be set in advance. Following [4], the weight coefficients a and a in Eq. (5) are set to . and . respectively. The radiusparameter δ in Eq. (1) is set to . . The number of anchor units isset as m c = 10 for each co-cluster center. In the following, we willdiscuss the sensitiveness of other parameters.For the number of image centers (denoted by C i ) and the numberof user clusters (denoted by C u ), we tune them by C i ∈ {
10, 20, 40,60, 80, 100, 200, 500 } and C u ∈ {
3, 6, 12, 18, 24, 30, 50, 200 } ,respectively. The corresponding F-scores are shown in Figure 5(a)and Figure 5(b), respectively. From the results, we can see that thebest result is obtained with C i = 40 and C u = 12 . When thesetwo parameters are set within [20 , and [6 , respectively, theperformance changes slightly. In the experiments, the default numberof image clusters and user clusters are set to 40 and 12, respectively.For α , β , λ , λ , and γ , we tune them by α = { } , β = { } , λ = {
0, 0.01, 0.05, 0.1, 0.25, 0.75, 1, 10 } , λ = { } , and γ = {
0, 0.1, 0.2, 0.3, 0.4,0.5, 0.6, 0.7, 0.8, 0.9, 1 } . The corresponding results are shown inFigure 5(c-e). SUGAR-TC achieves the best result with α = 0 . , β = 0 . , λ = 0 . , λ = 0 . , and γ = 0 . , which are set as thedefault values of these parameters in the experiments. E. Application: Image Retrieval
In this work, for each image, image retagging task can not onlyadd and remove tags, but also re-rank the completed tag list byassigning tags with different confidence scores. For each image, wecan obtain a tag ranking list, which can improve the performanceof tag-based image retrieval. Therefore, we conduct experiment oftag-based image retrieval to illustrate the effectiveness of SUGAR bycomparing with other related methods, including TRVSC [7], LR [1],NMC [9], LSLR [10], MRTF [2], TTC1 [4] and TTC2 [4].Following the experimental setting in [7], we perform tag-basedsocial image retrieval with ten queries on NUS-WIDE-128, i.e. , birds , building , butterfly , dog , fish , flowers , horses , plane , sunset and zoo . Average Precision (AP) is used as the evaluation metric.Specifically, given a ranked list with length T , AP is defined as AP = R (cid:80) Tr =1 p ( r ) · χ ( r ) . Here, R is the number of relevantimages in this ranked list, p ( r ) is the precision at cut-off top r , χ ( r ) is an indicator function: χ ( r ) = 1 if the r -th image is relevant withrespect to the ground-truth concept, otherwise χ ( r ) = 0 . Finally,we use MAP over all queries to evaluate the overall performance.Figure 6 shows MAP values with different ranking length T. We cansee that the proposed SUGAR-TC outperforms the other four imageretagging methods. This demonstrates SUGAR-TC is more effectivethan other related image reatgging methods in terms of tag-basedsocial image retrieval. V. C ONCLUSIONS
In this work, we propose a novel Social anchor-Unit GrAphRegularized Tensor Completion (SUGAR-TC) method to efficientlyrefine the tags of social images, regardless of the data scale. First,we utilize the co-clustering algorithm to obtain the representativeanchor units (anchor image and anchor user) across image and userdomains rather than traditional anchors in a single domain, and thenconstruct an anchor-unit graph with multiple intra/inter adjacencyedges. Second, we present a SUGAR tensor completion to refinetags of anchor images. Finally, we efficiently assign high-quality tagsto all non-anchor images by leveraging the potential relationshipsbetween non-anchor units and anchor units. Experimental results ona real-world social image database demonstrate the effectiveness andefficiency of SUGAR-TC compared with the state-of-the-art methods.R
EFERENCES [1] G. Zhu, S. Yan, and Y. Ma, “Image tag refinement towards low-rank,content-tag prior and error sparsity,” in
ACM MM , 2010.[2] J. Sang, J. Liu, and C. Xu, “Exploiting user information for image tagrefinement,” in
ACM MM , 2011.[3] L. Wu, R. Jin, and A. K. Jain, “Tag completion for image retrieval,”
IEEETransactions on Pattern Analysis and Machine Intelligence , vol. 35,no. 3, pp. 716–727, 2013.[4] J. Tang, X. Shu, G. Qi, Z. Li, M. Wang, S. Yan, and R. Jain, “Tri-clustered tensor completion for social-aware image tag refinement,”
IEEE Transactions on Pattern Analysis and Machine Intelligence ,vol. 39, no. 8, pp. 1662–1674, 2017.[5] J. Sang, C. Xu, and J. Liu, “User-aware image tag refinement via ternarysemantic analysis,”
IEEE Transactions on Multimedia , vol. 14, no. 3, pp.883–895, 2012.[6] G.-J. Qi, C. C. Aggarwal, Q. Tian, H. Ji, and T. S. Huang, “Exploringcontext and content links in social media: A latent space method,”
IEEETransactions on Pattern Analysis and Machine Intelligence , vol. 34,no. 5, pp. 850–862, 2012.[7] D. Liu, X.-S. Hua, M. Wang, and H.-J. Zhang, “Image retagging,” in
ACM MM , 2010.[8] Y. Yang, Z. Huang, H. T. Shen, and X. Zhou, “Mining multi-tagassociation for image tagging,” in
WWW , 2011.[9] X. Xu, L. He, H. Lu, A. Shimada, and R.-I. Taniguchi, “Non-linearmatrix completion for social image tagging,”
IEEE Access , vol. 5, pp.6688–6696, 2017.
UBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 8
10 20 40 60 80 100 200 5000.30.350.40.450.5 F - sc o r e (a) C i F - sc o r e (b) C u F - sc o r e , - (c) α and β F - sc o r e (d) λ and λ F - sc o r e (e) γ Fig. 5. The curves of F-scores obtained by varying one parameter with the other parameters fixed to be the default values.
10 20 30 40 50 60 70 80 90 100
Ranking length M AP TRVSCLRNMCLSLRMRTFTTC1TTC2SUGAR-TC
Fig. 6. MAP of different methods for tag-based image retrieval.[10] X. Li, B. Shen, B.-D. Liu, and Y.-J. Zhang, “A locality sensitive low-rankmodel for image tag completion,”
IEEE Transactions on Multimedia ,vol. 18, no. 3, pp. 474–483, 2016.[11] E. J. Candes and Y. Plan, “Matrix completion with noise,”
Proceedingsof the IEEE , vol. 98, no. 6, pp. 925–936, 2010.[12] D. Lin, “Using syntactic dependency as local context to resolve wordsense ambiguity,” in
ACL , 1997.[13] P. Cui, S.-W. Liu, W.-W. Zhu, H.-B. Luan, T.-S. Chua, and S.-Q. Yang,“Social-sensed image search,”
ACM Transactions on Intelligent Systemsand Technology , vol. 32, no. 2, p. 8, 2014.[14] W. Liu, J. Wang, S. Kumar, and S.-F. Chang, “Hashing with graphs,” in
ICML , 2011.[15] M. Wang, W. Fu, S. Hao, H. Liu, and X. Wu, “Learning on biggraph: Label inference and regularization with anchor hierarchy,”
IEEETransactions on knowledge and Data Engineering , vol. 29, no. 5, pp.1101–1114, 2017.[16] M. Wang, B. Ni, X.-S. Hua, and T.-S. Chua, “Assistive tagging: A surveyof multimedia tagging with human-computer joint exploration,”
ACMComputing Surveys , vol. 44, no. 4, p. 25, 2012.[17] X. Li, T. Uricchio, L. Ballan, M. Bertini, C. G. Snoek, and A. D.Bimbo, “Socializing the semantic gap: A comparative survey on imagetag assignment, refinement, and retrieval,”
ACM Computing Surveys ,vol. 49, no. 1, p. 14, 2016.[18] J. Fu, J. Wang, Y. Rui, X.-J. Wang, T. Mei, and H. Lu, “Imagetag refinement with view-dependent concept representations,”
IEEETransactions on Circuits and Systems for Video Technology , vol. 25,no. 8, pp. 1409–1422, 2015.[19] H. Xu, J. Wang, X.-S. Hua, and S. Li, “Tag refinement by regularizedLDA,” in
ACM MM , 2009.[20] M. Chen, A. Zheng, and K. Weinberger, “Fast image tagging,” in
ICML ,2013.[21] Z. Li, J. Tang, and T. Mei, “Deep collaborative embedding for socialimage understanding,”
IEEE Transactions on Pattern Analysis andMachine Intelligence , 2018.[22] X. Liu, S. Yan, T.-S. Chua, and H. Jin, “Image label completion bypursuing contextual decomposability,”
ACM Transactions on MultimediaComputing, Communications, and Applications , vol. 8, no. 2, p. 21,2012.[23] Z. Li and J. Tang, “Weakly supervised deep matrix factorization forsocial image understanding,”
IEEE Transactions on Image Processing ,vol. 26, no. 1, pp. 276–288, 2017.[24] Z. Lin, G. Ding, M. Hu, J. Wang, and X. Ye, “Image tag completion via image-specific and tag-specific linear sparse reconstructions,” in
CVPR ,2013.[25] X. Li, Y.-J. Zhang, B. Shen, and B.-D. Liu, “Image tag completion bylow-rank factorization with dual reconstruction structure preserved,” in
ICIP , 2014.[26] D. Rafailidis, A. Axenopoulos, J. Etzold, S. Manolopoulou, and P. Daras,“Content-based tag propagation and tensor factorization for personalizeditem recommendation based on social tagging,”
ACM Transactions onInteractive Intelligent Systems , vol. 3, no. 4, p. 26, 2014.[27] Z. Feng, S. Feng, R. Jin, and A. K. Jain, “Image tag completion bynoisy matrix recovery,” in
ECCV , 2014.[28] C. Deng, R. Ji, W. Liu, D. Tao, and X. Gao, “Visual reranking throughweakly supervised multi-graph learning,” in
ICCV , 2013.[29] B. Harwood and T. Drummond, “Fanng: fast approximate nearestneighbour graphs,” in
CVPR , 2016.[30] I. Suzuki and K. Hara, “Centered knn graph for semi-supervised learn-ing,” in
ACM SIGIR , 2017.[31] J. Tang, R. Hong, S. Yan, T.-S. Chua, G.-J. Qi, and R. Jain, “Imageannotation by knn-sparse graph-based label propagation over noisilytagged web images,”
ACM Transactions on Intelligent Systems andTechnology , vol. 2, no. 2, p. 14, 2011.[32] B. Chen, J. Wang, Q. Huang, and T. Mei, “Personalized video recom-mendation through tripartite graph propagation,” in
ACM MM , 2012.[33] J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo, “Effective multiplefeature hashing for large-scale near-duplicate video retrieval,”
IEEETransactions on Multimedia , vol. 15, no. 8, pp. 1997–2008, 2013.[34] M. Norouzi, A. Punjani, and D. J. Fleet, “Fast exact search in hammingspace with multi-index hashing,”
IEEE Transactions on Pattern Analysisand Machine Intelligence , vol. 36, no. 6, pp. 1107–1119, 2014.[35] W. Liu, C. Mu, S. Kumar, and S.-F. Chang, “Discrete graph hashing,”in
NIPS , 2014.[36] Q.-Y. Jiang and W.-J. Li, “Scalable graph hashing with feature transfor-mation.” in
IJCAI , 2015.[37] W. Liu, J. He, and S.-F. Chang, “Large graph construction for scalablesemi-supervised learning,” in
ICML , 2010.[38] M. Wang, W. Fu, S. Hao, D. Tao, and X. Wu, “Scalable semi-supervisedlearning by efficient anchor graph regularization,”
IEEE Transactions onKnowledge and Data Engineering , vol. 28, no. 7, pp. 1864–1877, 2016.[39] S. Kim and S. Choi, “Multi-view anchor graph hashing,” in
ICASSP ,2013.[40] W. Liu, J. Wang, and S.-F. Chang, “Robust and scalable graph-basedsemisupervised learning,”
Proceedings of the IEEE , vol. 100, no. 9, pp.2624–2638, 2012.[41] B. Xu, J. Bu, C. Chen, C. Wang, D. Cai, and X. He, “Emr: A scalablegraph-based ranking model for content-based image retrieval,”
IEEETransactions on Image Processing , vol. 27, no. 1, pp. 102–114, 2015.[42] Y. Xiong, W. Liu, D. Zhao, and X. Tang, “Face recognition via archetypehull ranking,” in
ICCV , 2013.[43] Y. Wu, M. Pei, M. Yang, J. Yuan, and Y. Jia, “Robust discriminativetracking via landmark-based label propagation,”
IEEE Transactions onImage Processing , vol. 24, no. 5, pp. 1510–1523, 2015.[44] I. S. Dhillon, “Co-clustering documents and words using bipartitespectral graph partitioning,” in
ACM KDD , 2001.[45] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in
NIPS , 2012.[46] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,”
SIAM review , vol. 51, no. 3, pp. 455–500, 2009.[47] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,”
Nature , vol. 401, no. 6755, p. 788, 1999.[48] J. Tang, X. Shu, Z. Li, G.-J. Qi, and J. Wang, “Generalized deeptransfer networks for knowledge propagation in heterogeneous domains,”
ACM Transactions on Multimedia Computing, Communications, andApplications , vol. 12, no. 4s, p. 68, 2016.
UBMISSION FOR IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018 9 [49] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “Nus-wide:a real-world web image database from national university of singapore,” in