Leveraging User Diversity to Harvest Knowledge on the Social Web
LLeveraging User Diversity to Harvest Knowledgeon the Social Web
Jeon-Hyung Kang
Information Sciences InstituteUniversity of Southern CaliforniaMarina del Rey, California 90292Email:[email protected]
Kristina Lerman
Information Sciences InstituteUniversity of Southern CaliforniaMarina del Rey, California 90292Email: [email protected]
Abstract —Social web users are a very diverse group with vary-ing interests, levels of expertise, enthusiasm, and expressiveness.As a result, the quality of content and annotations they create toorganize content is also highly variable. While several approacheshave been proposed to mine social annotations, for example,to learn folksonomies that reflect how people relate narrowerconcepts to broader ones, these methods treat all users andthe annotations they create uniformly. We propose a frameworkto automatically identify experts, i.e., knowledgeable users whocreate high quality annotations, and use their knowledge to guidefolksonomy learning. We evaluate the approach on a large bodyof social annotations extracted from the photosharing site Flickr.We show that using expert knowledge leads to more detailedand accurate folksonomies. Moreover, we show that includingannotations from non-expert, or novice, users leads to morecomprehensive folksonomies than experts’ knowledge alone.
I. I
NTRODUCTION
Knowledge production is no longer solely in the handsof professionals: on many Social Web sites ordinary peoplecreate and annotate a wide variety of content. On the socialphotosharing site Flickr (http://flickr.com), for example, userscan publish photographs, tag them with descriptive keywords,such as insect or macro , and organize them within per-sonal directories. While an individual’s annotations express herparticular world view, collectively social annotations providevaluable evidence for harvesting social knowledge, including f olksonomies ( f olk + taxonomies ) that show how peoplerelate broader concepts to narrower ones. Social knowledgeis idiosyncratic and may at times conflict with knowledgeexpressed in professionally curated taxonomies. For example,many people consider spiders to be insects , at odds withthe Linnean taxonomy of living organisms. However, suchknowledge is necessary to make sense of and leverage user-generated content on the Social Web. Thus, to find all imagesof spiders , you will sometimes have to look for insects .Recently, Plangprasopchok et al. [1] proposed a methodto learn folksonomies by integrating structured annotationsfrom many users, specifically, personal directories created byindividual Flickr users to organize their photos. The methodextends affinity propagation [2] to use structural informationto concurrently combine many shallow personal directoriesinto a larger common taxonomy. The method assumes thatthe quality of annotation from all users is the same. How- ever, Social Web users are highly diverse and vary in theirdegree of expertise and expressiveness. Knowledgeable userscreate high quality, detailed annotations, often using technicalterms. They specify intermediate concepts within multi-leveldirectories, e.g., linking jumping spider to spiders to arachnids to invertebrates . We call such users ex-perts . Novice users, on the other hand, are far less expressive,creating shallow directories that jump granularity levels, e.g.,linking spiders to bugs . Using experts’ knowledge enablesus to learn more accurate and detailed folksonomies.Diversity is important for groups and organizations [3]. Itcan lead to better group decision making and organizationalrobustness [4], as long as individual knowledge and opinionsare aggregated correctly [5]. Hence, identifying experts fromthe content they create, or from recommendations of otherpeople, has been an active research area. Previous works usednatural language analysis [6], [7] and topic modeling [8]techniques to identify experts from the text of documentsthey created, often combining it with analysis of the structureof links within an organization [9], [10]. Annotations on theSocial Web can help identify diverse classes of users. However,while previous researchers classified users based on theirannotation practices [11], they did not attempt to automaticallydistinguish expert from novice users.In this paper we propose methods to automatically identifyexpert users who provide high quality annotations and leveragetheir knowledge in folksonomy learning. First, in Section II,we describe and evaluate a method that examines structuredannotations to automatically identify expert users. Specifically,our method analyzes the structure and content of personaldirectories created by Flickr users. In Section III we extendthe inference method of Plangprasopchok et al. [1] to useexperts’ knowledge to guide the folksonomy learning process.In Section IV we show that the inference method that exploitsuser diversity by putting greater weight on annotations createdby experts can learn more accurate and detailed folksonomiesthan one that ignores diversity. Surprisingly, however, weshow that while experts’ knowledge is required to learn moreaccurate folksonomies, novice knowledge is needed to learnmore complete folksonomies. We also carry out a detailedinvestigation of the robustness of our method. a r X i v : . [ c s . I R ] O c t I. I
DENTIFYING E XPERT U SERS
Experts are knowledgeable individuals who can answerquestions within organizations and generate high quality data.Identifying such people is an important research topic in datamining, management science, and social network analysis.Researchers have proposed a variety of algorithms for au-tomatic expert identification, including language [7], proba-bilistic topic-based [8] and statistical [6] models and networkanalysis tools [12], [10], that identify experts based on thedocuments or email messages they exchange within organiza-tions. Hybrid approaches that combine topics and relationshipsbetween users [9] have also been explored.Expert identification is even more important for mininguser-generated content, since Social Web users form an ex-tremely diverse group, with widely varying levels of expertiseand enthusiasm for different topics. As a result, the qualityof data they create also varies tremendously. One way todifferentiate data quality is by identifying expert users. Weextend the features used to measure diversity in groups [3]and use them within a supervised expert classification method.The features measure users’ expertise based on the structure ofannotations they create. Unlike previous works that examined(textual) data people create, our method looks directly atknowledge structures they express through annotations.
A. Structured Annotations
Social web sites allow users to annotate content they createor share with others. In addition to tagging content, somesites also allow users to organize it hierarchically. Del.icio.ususers can group related tags into bundles, and Flickr userscan group related photos into sets and then group relatedsets in collections , thereby creating personal directories toorganize photos. The sites themselves do not impose anyrules on the vocabulary or semantics of directories; in practiceusers employ them to represent relations between broader andnarrower categories or concepts . Personal directories offerrich evidence for harvesting social knowledge and have beenused to learn communal taxonomies of concepts, otherwiseknown as folksonomies [13], [1].Following Plangprasopchok et al. [13] we call a directorya user creates to organize photos on Flickr a sapling . Theroot node of the sapling corresponds to a user’s collection,and inherits its name, while the leaves correspond to thecollection’s constituent sets (or other collections) and inherittheir names. The photos the user assigns to a set are tagged,and we propagate these tags to sets and to their parent col-lections. While most users create shallow saplings consistingof a top-level collection and constituent sets (see Fig. 1(a)),others create detailed, multi-level hierarchies about a topic ofinterest (Fig. 1(b)). We call the latter users experts and theformer novice . By manually inspecting saplings created byFlickr users, we found that structure and semantic consistencyare two important factors distinguishing expert from novice Saplings are not always tree-like. In these cases we convert them to trees. (a)(b)
Fig. 1. Saplings created by (a) novice and (b) expert users. users. Specifically, we have identified the following hallmarksof an expert: • generally creates many saplings with distinct concepts • creates deep ( > levels as in Fig. 1(b)) or broad saplings • provides top-level concepts that are meaningful to others.Overly-broad concepts, such as ‘life’, ‘things’, ‘misc’, ‘allsets’, etc., imply novice users • does not jump many levels, (e.g., attach ‘los angeles’ to‘world’) nor mix concepts of different granularity level(e.g., ‘table mountain’ and ‘equatorial guinea’ are neversiblings, as in Fig. 1(a)) • does not create conflicts (e.g., attach ‘los angeles’ to‘journey’ in one sapling while attaching ‘journey’ to ‘losangeles’ in another) • does not create multiple child concepts with same name(e.g., five ‘los angeles’ sets under ‘journey’). B. Features
To automate expert identification, we convert the observa-tions above into quantitative features. We divide the featuresinto two classes: user-level and sapling-level features.
1) User Features:
Experts express a variety of concepts.
User-Variety measures the number of saplings ( N ) and NumTwigs the number of relations a user creates.
User-Balance measures how uniform the saplings are insize. We measure this by entropy B U = ( − (cid:80) i p i ln p i ) / ln N, where p i is the number of nodes in sapling i divided by thetotal number of nodes the user creates. User-Disparity measures differences between concepts ex-pressed in user’s saplings [3]. We compute disparity usingJensen-Shannon divergence between the tag distributions ofthe two saplings: (cid:88) i,j JS ( τ i || τ j ) = (cid:88) i,j (0 . D ( τ i || τ k ) + 0 . D ( τ j || τ k )) , (1)a) (b) Fig. 2. Frequency distribution of distinct children of root nodes (a) ‘nature’and (b) ‘other stuff’. where τ i represents tag distribution of sapling i, τ k = 1 / τ i + τ j ) and D ( . ) is Kullback-Leibler divergence. DisparityNor-malized simply divides the above measure by the number ofnodes in the saplings.
2) Sapling Features:
Experts express detailed knowledgein particulars topics, not necessarily all topics.
Sapling-Variety combines depth and breadth of the sapling: V S = (cid:80) i =1: L i × n i , where L is the depth of the sapling and n i is the number of nodes at level i . This gives more creditto deeper representations if both saplings are equally large. Sapling-Balance measures how balanced the sapling isat each level. We quantify balance by normalized entropybased on expected number of nodes at current level giventhe number of nodes at the previous level: B S = 1 /L × (cid:80) i =1: L (cid:0) ( − (cid:80) j p ij ln p ij ) / ln( n i ) (cid:1) , where n i is number ofnodes at level i , p ij is proportion of children of j ’th nodeat level i . For example, if there are 4 nodes in level 1 with3, 3, 1, 2 children respectively, then n is 4, p j is (3/9, 3/9,1/9, 2/9). To balance level 2, we need between two and threechildren per parent.Several features measure concept consistency and nodeuniqueness. Inconsistency can be computed by the number of conflicts (i.e. attaching node A to node B in one sapling and B to node A in another sapling); agreement is quantified byhow many users create the same parent-child relation; node(or twig) uniqueness is computed by the ratio of unique nodenames to the total number of nodes in the sapling. Otherfeatures include sapling depth , breadth , number of nodes andterminal leaves it has, and the ratio of number of leaves tothe total number of nodes in the sapling. Root-Diversity is an important hallmark of experts. Ex-perts create generalizable knowledge using categories thatare meaningful to others. A vague concept, such as ‘misc’,‘other’, ‘things’, will mean different things to different people.Consequently, there will be little agreement about the childconcepts of such root nodes, with every user specifying adifferent child. There is far more agreement about the childrenof more specific concepts, such as ‘europe’. We quantify thegeneralizability of a concept by the the distribution of distinctchild nodes across all users.Given a concept (sapling root), we extract all sub-conceptsusers have specified as children of this root. Figure 2 showsthe distributions of unique children of the roots ‘nature’ and ‘other stuff’, sorted by frequency of occurrence. A peakeddistribution (Fig. 2(a)) indicates agreement among users aboutsub-concepts and implies that the root concept is meaningfulto others. A flat distribution (Fig. 2(b)) implies there is littleagreement about the root concept, with practically each userexpressing a different sub-concept. This indicates that theroot concept is vague. We quantify the peakedness of thedistribution by measuring how many unique nodes are neededto cover 30%, 50% and 70% of child nodes. For example, tocover 70% of the distinct children of the root ‘europe’, we needto look at 21.3% of the most frequent children, while to coverthe same fraction of children of ‘other stuff’, we need to lookat 64.6% of the most frequent children. Other root concepts inour data set that are meaningful to many users include ‘nature’,‘animal’, ‘flower’, ‘bird’, ‘usa’, ‘sport’, while the vaguer, lessmeaningful concepts include ‘location’, ‘subject’, ‘everythingelse’, ‘landscape’, ‘random’, ‘stuff’, and ‘miscellaneous’.Other features characterizing root diversity include the num-ber of people who have created a root node with that name,and the number of unique children the root has over all users.
C. Automatically Identifying Experts
We collected saplings created by 7,121 Flickr users whowere members of wildlife and nature photography publicgroups. We trained a model to use the features above toautomatically identify experts among these users. We trainedthe model on a small set of manually labeled data and used itto label a larger test set. We then examined and labeled newpredictions made by the model, added them to the trainingset and retrained the model. We iterated this self-trainingprocedure on the unlabeled test data to discover new experts,and re-trained the model with the enriched data.To create the initial training set, we asked three annotators toreview saplings created by 200 Flickr users randomly selectedfrom the set of 1000 who specified most relations. Annotatorsused the criteria above to identify experts. Each user’s saplingswere laid out hierarchically using yEd graph visualizationtool. Annotators identified 20–45 experts among 200 users.We treated 19 experts all annotators agreed upon as positive,and the rest as negative, examples in the training set.
Training Set Cross Validation Training PositiveIterations J48 Random Forest LibSVM examples examplesPr Re F Pr Re F Pr Re F1 0.44 0.58 0.50 0.67 0.42 0.52 0.80 0.63 0.70 200 192 0.56 0.50 0.53 0.61 0.50 0.55 0.76 0.42 0.54 274 383 0.56 0.42 0.49 0.57 0.48 0.52 0.86 0.57 0.69 292 424 0.51 0.55 0.53 0.50 0.53 0.45 0.88 0.71 0.78 293 425 0.44 0.44 0.44 0.57 0.47 0.51 0.80 0.58 0.67 297 436 0.50 0.41 0.45 0.54 0.34 0.42 0.84 0.50 0.63 292 437 0.42 0.39 0.40 0.57 0.39 0.46 0.88 0.66 0.75 311 438 0.61 0.49 0.55 0.79 0.35 0.48 1.00 0.88 0.93 315 43
TABLE IJ48, R
ANDOM F OREST , AND L IB SVM
MODEL CROSS VALIDATIONRESULTS AT EACH ITERATION . T
HE SIZE OF THE TRAINING SETINCREASES AT EACH ITERATION AS POSITIVE PREDICTIONS MADE BY THEMODEL ARE ADDED TO THE TRAINING SET . We trained three different models (J48 [14], Random-Forest [15], and LibSVM [16]) on the training set of 200 eature name Feature rankSVM ReliefF InfoGain ChiSquared Avg.Sapling-Depth 1 1 1 1 1Sapling-Number Of Leaves 5 3 3 3 2Sapling-Balance 12 2 2 2 3User-Balance 7 4 7 7 4Sapling-Variety 4 14 4 4 5Sapling-Number Of Children 9 15 6 5 6Root-Diversity-50% 10 8 14 14 8User-Variety 14 12 10 10 7User-DisparityNormalized 8 13 13 13 9Number Of Twigs 11 20 8 9 10Sapling-Number of Unique Twig Ratio 3 16 16 16 13Sapling-Number of Unique Term Ratio 6 10 18 18 14Sapling-Number of Conflicts 2 22 19 19 19
TABLE IIF
EATURE SELECTION RESULTS , WITH FEATURES SORTED BY THEIRAVERAGE RANK . labeled users, and applied the models to classify unlabeledtest data. We aggregated positive predictions made by all threemodels, manually labeled them, and iterated the procedure.Table I reports results of cross validation at each iteration.We reached 100% precision, 88% recall and 93% f-score withLibSVM after eight iterations and stopped at this point. Aftereight iterations, our training set had 315 users, of which 43were experts. Note that only a small fraction of all users canbe classified as experts. Self-training enabled us to enrich thetraining set with positive examples without having to labelthousands of users. Results of 10-fold cross validation oflibSVM on labeled data was 84% precision, 65% recall, and74% f-score. Applying the final model to the entire data setidentified 66 experts in total.To see which features are important, we used four featureselection algorithms: SVM Attribute Evaluation [17], Relieffor Attribute Estimation [18], Information Gain AttributeEvaluation [ ? ], and Chi Squared Attribute Evaluation [19].SVM Attribute Evaluation method based its decision functionon the support vectors of the borderline cases, while othersbased their decisions on the average cases. This differenceleads to different rankings of features. Relief evaluates theimportance of a feature by repeatedly sampling an instanceand estimating how well feature values distinguish among in-stances near each other. Table II reports how different featuresare ranked by these algorithms. All methods identify saplingdepth as the most important feature for identifying experts.All methods besides SVM choose the number of leaves in thesapling, and how balanced they are within the sapling, as thenext most important features. Generally, sapling-level featuresare judged to be more important than user-level features byall methods, similar to intuitions of human annotators.III. U SING E XPERT K NOWLEDGE IN F OLKSONOMY L EARNING
Plangprasopchok et al. [1] proposed a method to learnfolksonomies by clustering many saplings created by dif-ferent users. Their relational affinity propagation (RAP) isa probabilistic method for clustering structured data into acommon deeper and bushier tree. RAP merges root nodes (a) (b) (c)
Fig. 3. Relational affinity propagation (RAP): (a) two saplings being merged.Dashed lines surround a group of nodes assigned to the same exemplar (inorange). (b) Binary variable matrix corresponding to the configuration in (a).(c) Factor graph formulation of binary RAP. of different saplings to extend the breadth of the learnedfolksonomy, and it merges a child node of one sapling to theroot of another to extend its depth. RAP is based on affinitypropagation (AP) [2], and it identifies a set of exemplars thatbest represent all the data. Exemplars emerge as messagesare passed between data items, with each item seeking anassignment to the most similar exemplar. AP identifies a setof exemplars, or clusters, which maximize the net similarity between exemplars and data items assigned to them.Following binary AP framework of [20], let c be an N × N matrix, were N is a number of data items. A binary variable c ij = 1 if node (data item) i is assigned to node j (i.e., j isan exemplar of i ); otherwise, c ij = 0 . AP uses constraints toguide the inference process to ensure cluster consistency. Thefirst constraint, I i , which is imposed on the row i , indicatesthat a data item can belong to only one exemplar ( (cid:80) j c ij = 1 ).The second constraint, E j , which is imposed on the column j , indicates that if an item other than j chooses j as itsexemplar, then j must be its own exemplar ( c jj = 1 ). APavoids assigning exemplars which violate these constraints.A similarity function S ( . ) measures the similarity of a nodeto its exemplar. If c ij = 1 , then we add S ( c ij ) to the objectivefunction; otherwise, S ( c ij ) = 0 . The self-similarity, S ( c jj ) ,also called preference , is usually set to less than the maximumsimilarity value in order to avoid creating a configuration with N exemplars. In general, the higher the value of preference fora particular item, the more likely it is to become an exemplar.Setting all preferences to the same value indicates that allitems are equally likely to become exemplars. The globalobjective function measures the quality of a configuration (i.e.,exemplars and items assigned to them): S ( c , · · · , c NN ) = (cid:88) i,j S ij ( c ij ) + (cid:88) i I i ( c i , · · · , c iN )+ (cid:88) j E j ( c j , · · · , c N ) . (2)A message passing algorithm [2] is used to find a configurationthat maximizes the net similarity without violating I and E constraints. . Relational Affinity Propagation In order to cluster structured data into a tree, Plangprasop-chok et al. [1] introduced a new “single parent” constraint. The F -constraint allows a node to select another as an exemplaronly if their parents belong to the same exemplar, thus ensuringthat the learned structure forms a tree. Consider clusteringstructured data in Fig. 3(a), where exemplars are in orange, anddashed lines surround nodes assigned to the same exemplar.When child nodes i and k decide whether to merge with node j , the F -constraint checks whether their parents h and m belong to the same exemplar. Figure 3(b) shows the binaryvariable matrix corresponding the configuration in (a). Thisconfiguration is undesirable since it does not correspond toa tree: nodes i and k are assigned to exemplar j , but theirparents belong to different exemplars.In its original formulation, the F -constraint was imposed onchild nodes only and could result in undesirable configurations.The F -constraint checks whether i and k can be assignedto j , and since they cannot, it forces them into separateclusters. While the configuration is valid, it leads to a shallowfolksonomy. We modify the F -constraint to prevent suchsituations. The modified F -constraint is imposed on both childand parent nodes, if the parent node is also an exemplar: F j ( c j , . . . , c Nj ) = −∞ ∃ child i : c ij = 1 and ex ( pa ( i )) (cid:54) = ex ( pa ( ne ( j )))0 otherwisewhere ne ( . ) returns a set of nodes that share the exemplar of itsargument, pa ( . ) returns index of the parent of its argument,and ex ( . ) returns the index of the argument’s exemplar. Inthe illustration in Fig. 3, suppose that k is found to besimilar enough to j so that they can be merged. To decidewhether i too can choose j as an exemplar, the modified F -constraint checks whether the parent exemplar of node i isthe same as the parent exemplar of any of j ’s neighbors. Ifno, i won’t be able to pick j as an exemplar. The objectivefunction in Eq. 2 is modified by the addition of the new term (cid:80) j F j ( c j , · · · , c N ) ; we use max-sum method to optimize it. B. Integrating Expert Knowledge
RAP provides a framework to integrate experts’ knowledgein folksonomy learning. We do this simply by giving the nodesfrom saplings created by experts higher preference, or self-similarity, values. This means that these nodes will be morelikely to become exemplars, and expert knowledge will guidethe folksonomy learning process.
C. Implementing RAP
Binary RAP may be written as a factor graph shownin Fig. 3(c). Following Ref. [21] and Ref. [1], we derived message update formulas for β , η , α , ρ , τ and σ : β ij = s ( i,j )+ α ij + τ ij , (3) η ij = − max k (cid:54) = j β ik , (4) α ij = (cid:80) k (cid:54) = j max [ ρ kj , i = j min [0 ,ρ jj + (cid:80) k/ ∈{ i,j } max [ ρ kj , i (cid:54) = j (5) ρ ij = s ( i,j )+ η ij + τ ij , (6) τ ij = (cid:80) k (cid:54) = j ; k ∈ S { ne ( j ) } max [ σ kj , i = j min [0 ,ρ jj + (cid:80) k/ ∈{ i,j } ; k ∈ S { ne ( j ) } max [ σ kj , i (cid:54) = j (7) σ ij = s ( i,j )+ η ij + α ij . (8)In Eqs. (7) and (8) S { ne ( j ) } represents set of nodes sharingsame parent exemplar as neighbors of j . Note that we do notneed to check all neighbors of j , but just one child node amongall neighbors, since all nodes in ne ( j ) must already shareparent exemplar. These message update equation will makeour model favor the valid configuration, which maximizes theobjective function S ( c , · · · , c NN ) . Since message passingalgorithms can be written in max-sum form, they can be easilyparallelized on multi-core computers [22]. We implementedthe message update formulas using map-reduce parallel pro-gramming framework [23], which ran on 30+ node cluster.IV. E XPERIMENTAL R ESULTS
We measured the impact of expert knowledge on the folk-sonomies learned from Flickr data. Our data set consists of20,759 saplings created by 7,121 users. A node can be a col-lection or a set. The tags of all photos within a set are assignedto the set node and propagated to the collection node. Westemmed all terms (tags, set and collection names) using thePorter stemming algorithm and measured similarity betweena pair of nodes i and j by the number of common tags t ij they have among their top 40 tags: S ( i, j ) = min (1 ., t ij / .We infer exemplars and clusters by initializing all messagesto zero, and update exemplar assignments at each iterationuntil convergence. We check convergence by monitoring thenumber of exemplars and the stability of net similarity.We selected 31 seed terms consistent with Ref. [1] andgenerated folksonomies for these seed terms using RAP withand without expert knowledge. To learn a folksonomy, we firstneed to select relevant saplings from the data set. We created asnowball sample of relevant saplings as follows. For the seedterm that will be the root of the learned folksonomy, first weretrieve all saplings whose root has the same name as the seedterm. We then retrieve saplings whose root has the same nameas one of the children in the first set of saplings, and so on.We include expert knowledge in one of two ways: (1) usingsnowball sample of relevant saplings, including those createdby the 66 experts the model identified; (2) in addition to these,use all saplings created by the experts in the snowball sample.Besides varying the amount of expert knowledge used bythe learning algorithm, we can also vary its weight . We usedthe following strategies to vary the emphasis placed on expertknowledge: (1) treat all users uniformly by setting preferencevalues of all nodes to the mean of similarity scores (ordinarya)(b) Fig. 4. Folksonomies for ‘africa’ learned (a) without and (b) with expertknowledge (expert nodes in orange).
RAP); (2) set preference values of expert nodes to twice themean, while all other preference values are set to the mean.As an illustration, consider portion of the ‘Africa’ folkson-omy, shown in Fig. 4(a) learned using saplings such as those inFig. 1, but without differentiating between expert and noviceusers. The root has a child ‘Christmas’, because some peoplespent their Christmas holidays in Africa. Since ‘Christmas’is linked to many other concepts such as ‘family’, ‘card’,etc, it introduces irrelevant concepts into ‘Africa’ folksonomy.Figure 4(b) shows portion of the ‘Africa’ folksonomy learnedwith expert knowledge. Now the nodes (‘xmas’, ‘family’,‘card’, etc.) originally placed under ‘Africa’ → ‘Christmas’were moved to ‘holiday’ → ‘Christmas’. Moreover, ‘TableMountain’ and other nodes under ‘Africa’ → ‘Cape Town’were moved under ‘Africa’ → ‘South Africa’ → ‘Cape Town’.As we can see from this illustration, adding expert knowledgehelps produce a more relevant and detailed folksonomy. A. Automatic Evaluation
Table III reports results of running RAP in three differentsettings for 31 seed terms: ( M1 ) relevant saplings collected bythe snowball sample with no differentiation between noviceand expert users (all preference values set to the mean ); ( M2 )using relevant saplings plus all other saplings from experts,with no differentiation between users ( mean +EXP ); ( M3 )same saplings as before, but with higher preference valuesfor experts ( ). While the learning algorithmgenerally produces several trees, we evaluate only the most‘popular’ tree, one that aggregates the greatest number ofsaplings. The popular tree learned by M1 contained between14 and 7925 nodes (2001.26 on average), and that learned byM2 between 16 and 8114 nodes (1947.87 on average), whilefolksonomies learned by M3 were smaller, between 14 and5667 nodes (1292.81 on average).We automatically measure the quality of the learned folk-sonomies by comparing them to the reference taxonomyfrom the Open Directory Project (ODP) [24]. We appliedtwo metrics: Lexical Precision ( LP ) and Taxonomic Overlap( TO ) [25]. LP measures term overlap between the learned and seed M1: mean M2: mean + EXP M3: 2*mean + EXPdp LP TO dp LP TO dp LP TO %EXP reptil 3 0.857 0.841 3 0.857 0.8412 2 1.000 0.9199 14.28invertebr 3 0.197 0.599 4 0.181 0.5792 4 0.183 0.6272 15.07central america 3 0.134 0.586 3 0.130 0.5866 3 0.130 0.5821 9.8cat 3 0.024 0.587 3 0.032 0.7052 3 0.472 0.8065 0south africa 3 0.019 0.389 3 0.068 0.5022 3 0.060 0.478 19.51africa 3 0.396 0.610 4 0.379 0.6109 4 0.457 0.671 30craft 3 0.155 0.441 5 0.263 0.486 5 0.155 0.4157 1.1fish 3 0.079 0.335 5 0.072 0.3261 3 0.174 0.4719 0dog 4 0.014 0.496 4 0.013 0.4796 4 0.020 0.6661 0build 3 0.037 0.366 5 0.003 0.3508 5 0.004 0.3714 2.77north america 3 0.217 0.466 5 0.228 0.4116 6 0.265 0.444 14.13south america 3 0.095 0.416 4 0.234 0.5394 4 0.292 0.5991 18.21australia 3 0.171 0.541 4 0.258 0.5612 4 0.179 0.5789 6.18insect 5 0.027 0.349 4 0.027 0.2901 4 0.032 0.3721 4.96flora 3 0.127 0.450 3 0.127 0.4504 3 0.131 0.4523 3.52vertebr 4 0.034 0.390 4 0.034 0.3892 3 0.273 0.5986 17.5urban 4 0.061 0.394 4 0.061 0.3942 4 0.061 0.3946 2.64unit state 4 0.038 0.525 4 0.038 0.5203 4 0.038 0.5236 7.93bird 3 0.051 0.397 5 0.052 0.3996 5 0.058 0.4497 3.97plant 3 0.115 0.461 6 0.124 0.475 3 0.351 0.584 6.25canada 4 0.039 0.305 6 0.038 0.301 4 0.075 0.4595 6.11unit kingdom 3 0.219 0.583 5 0.231 0.6005 6 0.216 0.5753 5.49asia 4 0.052 0.449 6 0.055 0.4379 5 0.056 0.4676 11.8sport 4 0.208 0.444 6 0.226 0.4328 6 0.263 0.4575 16.46europ 4 0.252 0.535 5 0.249 0.5333 5 0.276 0.5382 10.91fauna 4 0.240 0.438 4 0.261 0.4448 5 0.264 0.4549 11.26countri 4 0.075 0.530 7 0.086 0.4777 7 0.118 0.515 15.11anim 4 0.054 0.446 6 0.059 0.4328 7 0.112 0.4838 9.5flower 5 0.053 0.391 7 0.054 0.3773 7 0.053 0.3887 5.41world 5 0.027 0.358 8 0.025 0.3137 9 0.025 0.3549 17.04citi 5 0.005 0.500 5 0.005 0.4936 5 0.007 0.502 5.67 average TABLE IIIE
VALUATION OF FOLKSONOMIES LEARNED FOR
31 (
STEMMED ) SEEDTERMS . reference taxonomies, independent of their structure, whileTO measures the overlap of ancestors and descendants ofa pair of terms from the learned and reference taxonomieswithout considering their order. We also measure the depth of the taxonomy. We observe that while RAP leads to fewor no structural inconsistencies, integrating expert knowledgeinto the learning process improves the quality of the learnedtaxonomies (higher LP and TO scores) and how detailedthey are (greater depth), while also removing irrelevant nodes(smaller trees).Is expert knowledge alone sufficient to produce high qual-ity folksonomies? The last column in Table III shows thepercentage of nodes in the learned folksonomy that can beattributed to experts. On average, this fraction is less than 10%.We conclude that integrating knowledge from both expert andnovice users leads to more comprehensive folksonomies thanusing expert knowledge alone. B. Manual Evaluation
Automatic method was not comprehensive, since it can onlyevaluate portions of the learned folksonomies that used thevocabulary of the reference taxonomy. Therefore, we alsocarried out a manual evaluation using the Coding AnalysisToolkit (CAT) [26], which provides a Web interface for usersto answer customized questions. Each question presented tothe user a portion of the learned folksonomy, laid out as atree, and asked if it was correct. Since the trees were generallyvery large, we reduced their size as follows. For a pair ofolksonomies learned by methods M and M for some seedterm, we identified leaf nodes with the same name and thesame ancestors in the two trees and removed them from bothtrees. Applying this strategy iteratively eliminated on average50% to 70% of the nodes. If the reduced tree was still large, wesegmented it into disjoint subtrees with at most 10 child nodesat any level. We asked five annotators to determine whethereach reduced tree (or subtree) was correct (837 questionstotal). Overall annotators judged 45.30% of the trees learnedby method M and 68.24% learned by M to be correct.Thus, using expert knowledge leads to better folksonomies.We calculated statistical significance of results of automaticand manual annotation. We find that the difference in TOscores between RAP without and with expert knowledge issignificant at 95% level with t(31)=2.265, p ≤ C. Robustness (a)(b)
Fig. 5. Robustness of proposed method, as measured by the taxonomicoverlap (TO), with respect to (a) preference values and (b) percentage ofexperts misidentified.
Finally, we address robustness of the method with respectto changes in the preference values assigned to expert nodes.We ran our algorithm for six preference values of the form x ∗ mean , where x ∈ { , . , . , . , . , . } . We report T O scores for three seeds (‘invertebrate’, ‘africa’, and ‘bird’)in Fig. 5(left). The quality of the learned folksonomies, asmeasured by TO, rises with preference values, and saturatesaround x = 2 . .Another question is how the accuracy of automatic expertidentification affects the quality of the learned folksonomies. For this experiment, we randomly selected n % of expert nodesand swapped their preference values with the same numberof randomly selected novice nodes. We varied percentage ofswapped nodes from 0% to 100% and report T O scores for thethree learned folksonomies in Fig. 5(right). As we increasedthe number of swapped nodes,
T O scores dropped by 9%-12%. Note that when all expert nodes were swapped fornovice nodes, i.e., random novice nodes had their preferencevalues set to ∗ mean , the T O scores were similar to thosethat did not differentiate between expert and novice nodes.The difference between 100% and 0% swapped is similar toRAP with and without expert knowledge, as expected. Weconclude that even moderately high errors (up to 50%) inexpert identification do not significantly degrade the qualityof the learned folksonomies.V. R
ELATED W ORK
Expert identification has been addressed by researchers inseveral different fields. Existing works analyze the (textual)content of documents people create, the link structure ofthe interactions between people, or a combination of bothmethods. Zhang et al. [9] proposed a probabilistic algorithmto find experts on a given topic by using local informationabout a person (e.g., publications) and relationships betweenpeople. A similar approach was used by Maybury [6] to findexperts within organizations from the documents (publications,publicly shared folders) they create and relations between them(project information, citations). Balog et al. [7] used genera-tive language models to identify experts among authors ofdocuments, while Deng et al. [8] explored topic-based modelfor finding experts in academic fields. Davitz et al. [10] usednetwork analysis tools to identify experts based on the docu-ments or email messages they create within their organizations.Content quality analysis in social media has been investigatedfrom many research. Agichtein et al. [27] investigated methodsto measuring quality of contents by content, user relationshipfeatures. Hu et al. [28] proposed quality accessing model usingthe interaction data between articles and their contributors. Ourapproach is similar in spirit, in that we look at the contents ofdata people create to identify experts, although we have not yetincluded relations between people into analysis. Unlike theseearlier methods, we use the structure of annotations to measuretheir expertise on a topic. While Korner et al. [11] proposeda method to differentiate users in social tagging systems, theyclassify users as categorizers and describers based on theirtag usage, and show that there is more semantic agreementbetween describers. They do not attempt to learn taxonomiesnor differentiate the quality of annotations.With the advent of crowdsourcing services, labeling largedatasets has become easier. However, due to variations inannotators’ abilities, significant post-processing is required.To address this problem, Welinder et al. [29] proposed alabeling strategy based on the estimation of most likely valueof current labels and annotator’s abilities. Sheng et al. [30]studied repeated–labeling strategies to improve label quality.Our work is different in the sense that on the Social Web usersreely choose content to label, as well as labels themselves(tags, directories), that reflect their own interest in content.Our work is also related to broader efforts to “crowdsource”knowledge production, embodied, for example, by “citizenscience” projects and “wisdom of crowds” approaches [31].Researchers have studied methods that aggregate data ofvarying quality [32], [33]. However, the amount and variationof data in these studies was limited. Our approach can auto-matically identify the quality of data and aggregate it fromthousands of users. VI. C
ONCLUSION
In this paper, we propose a framework to automaticallyidentify experts based on the linguistic and structural featuresof the annotations they create, and use experts’ annotations toguide the folksonomy learning process. We show that usingexperts’ knowledge can produce more accurate and detailedfolksonomies. We also show that proposed method is robustto errors in expert identification. Our work generalizes beyondFlickr to other structured data sources (eBay categories, Deli-cious bundles, Bibsonomy relations, file systems).In future work, we would like to extend automatic expertidentification procedure using Bayesian approach [31]. Expertsare able to be modeled in continuous variable rather than 1or 0 binary variable. By identifying experts in more detail,we could control the degree to which experts knowledge isused. We would also like to extend RAP to apply to otherstructure learning problems, such as alignment of biologicaldata. Finally, we would like incorporate more efficient infer-ence algorithm and compare the aproach to other statisticalrelational learning approaches.A
CKNOWLEDGMENT
We would like to thank Jong-hyop Kim for implementingCAT evaluation and the annotators for evaluating the learnedfolksonomies. This work is based on research supported bythe National Science Foundation under awards IIS-0812677and CMMI-0753124. R
EFERENCES[1] A. Plangprasopchok, K. Lerman, and L. Getoor, “A probabilistic ap-proach for learning folksonomies from structured data,” in
Proc. 4thACM Web Search and Data Mining Conf. (WSDM) , February 2011.[2] B. Frey and D. Dueck, “Clustering by passing messages between datapoints,” science , vol. 315, no. 5814, p. 972, 2007.[3] A. Stirling, “A general framework for analysing diversity in science,technology and society,”
J. Royal Society Interface , vol. 4, no. 15, p.707, 2007.[4] J. Surowiecki,
The wisdom of crowds , 1st ed. New York, NY: AnchorBooks, 2005.[5] S. E. Page,
The Difference: How the Power of Diversity Creates BetterGroups, Firms, Schools, and Societies . Princeton, NJ, USA: PrincetonUniversity Press, 2007.[6] M. T. Maybury, “Knowledge on Demand: Knowledge and ExpertDiscovery,”
J. Universal Computing , vol. 8, no. 5, pp. 491+, 2002.[7] K. Balog, L. Azzopardi, and M. de Rijke, “Formal models for expertfinding in enterprise corpora,” in
Proc. 29th annual Int. ACM SIGIRconference on Research and Development in Information Retrieval .ACM, 2006, pp. 43–50.[8] H. Deng, I. King, and M. Lyu, “Formal models for expert finding onDBLP bibliography data,” in
Data Mining, 2008. ICDM’08. Eighth IEEEInt. Conf. on . IEEE, 2009, pp. 163–172. [9] J. Zhang, J. Tang, and J. Li, “Expert finding in a social network,”
Advances in Databases: Concepts, Systems and Applications , pp. 1066–1069, 2010.[10] J. Davitz, J. Yu, S. Basu, D. Gutelius, and A. Harris, “iLink: Search andRouting in Social Networks,” in
Proc. Knowledge Discovery and DataMining Conf. (KDD-2007) , 2007.[11] C. Krner, D. Benz, M. Strohmaier, A. Hotho, and G. Stumme, “Stopthinking, start tagging - tag semantics emerge from collaborative ver-bosity,” in
Proceedings of the 19th International World Wide WebConference (WWW 2010) . Raleigh, NC, USA: ACM, Apr. 2010.[12] C. S. Campbell, P. P. Maglio, A. Cozzi, and B. Dom, “Expertiseidentification using email communications,” in
In CIKM 03: Proceedingsof the twelfth international conference on Information and knowledgemanagement . ACM Press, 2003, pp. 528–531.[13] A. Plangprasopchok, K. Lerman, and L. Getoor, “Growing a tree in theforest: Constructing folksonomies by integrating structured metadata,”in
Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD) ,2010.[14] J. R. Quinlan,
C4.5: programs for machine learning . San Francisco,CA, USA: Morgan Kaufmann Publishers Inc., 1993.[15] L. Breiman and E. Schapire, “Random forests,” in
Machine Learning ,2001, pp. 5–32.[16] C. chung Chang and C.-J. Lin, “Libsvm: a library for support vectormachines,” 2001.[17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection forcancer classification using support vector machines,”
Mach. Learn. ,vol. 46, pp. 389–422, March 2002.[18] K. Kira and L. A. Rendell, “A practical approach to feature selection,” in
Proceedings of the Ninth International Workshop on Machine Learning ,ser. ML ’92. Morgan Kaufmann Publishers Inc., 1992, pp. 249–256.[19] P. E. Greenwood and M. S. Nikulin,
A Guide to Chi-Squared Testing ,1st ed. New York, NY: Wiley.[20] I. Givoni and B. Frey, “A binary variable model for affinity propagation,”
Neural computation , vol. 21, no. 6, pp. 1589–1600, 2009.[21] C. Bishop,
Pattern recognition and machine learning . Springer NewYork, 2006, vol. 4.[22] C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun,“Map-reduce for machine learning on multicore,” in
Advances in NeuralInformation Processing Systems 19 . The MIT Press, 2007, p. 281.[23] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing onlarge clusters,”
Comm. ACM , vol. 51, no. 1, pp. 107–113, 2008.[24] V. P`amies, “Open Directory Project,” 2008.[25] K. Dellschaft and S. Staab, “On how to perform a gold standard basedevaluation of ontology learning,”
The Semantic Web-ISWC 2006 , pp.228–241, 2006.[26] C.-J. Lu and S. W. Shulman, “Rigor and flexibility in computer-basedqualitative research: Introducing the Coding Analysis Toolkit,”
Int. J.Multiple Research Approaches , vol. 2, pp. 105–117, 2008.[27] E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, E. Agichtein,C. Castillo, D. Donato, A. Gionis, and G. Mishne, “Finding high-qualitycontent in social media with an application to community-based questionanswering,” in
In Proceedings of WSDM , 2008.[28] M. Hu, E. peng Lim, A. Sun, H. W. Lauw, and B. quy Vuong,“Measuring article quality in wikipedia: Models and evaluation.”[29] P. P. P. Welinder, “Online crowdsourcing: rating annotators and obtainingcost-effective labels,” in
CVPR , 2010.[30] V. S. Sheng, F. Provost, and P. G. Ipeirotis, “Get another label?improving data quality and data mining using multiple, noisy labelers.”[31] M. Steyvers, M. D. Lee, B. Miller, and P. Hemmer,
The Wisdom ofCrowds in the Recollection of Order Information . MIT Press, 2009,pp. 1785–1793.[32] P. Hemmer, M. Steyvers, and B. Miller, “The Wisdom of Crowds withInformative Priors,” in
Proc. 32nd Annual Conf. of the Cognitive ScienceSociety , 2010.[33] J. Yu, W.-K. Wong, and R. Hutchinson, “Modeling Experts and Novicesin Citizen Science data for Species Distribution Modeling,” in