[PDF] Semantic Stability in Social Tagging Streams

Abstract

One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, users or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources may become semantically stable over time as more and more users tag them. At the same time, previous work has raised an array of new questions such as: (i) How can we assess the semantic stability of social tagging systems in a robust and methodical way? (ii) Does semantic stabilization of tags vary across different social tagging systems and ultimately, (iii) what are the factors that can explain semantic stabilization in such systems? In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization processes in a wide range of social tagging systems with distinct domains and properties and (iii) detecting potential causes for semantic stabilization, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language. Our results show that tagging streams which are generated by a combination of imitation dynamics and shared background knowledge exhibit faster and higher semantic stability than tagging streams which are generated via imitation dynamics or natural language streams alone.

Full PDF

SSemantic Stability in Social Tagging Streams

Claudia Wagner

U. of Koblenz & GESIS [email protected] Philipp Singer

Graz University of Technology [email protected] Strohmaier

U. of Koblenz & GESIS [email protected] Bernardo Huberman

HP labs Palo Alto [email protected]

ABSTRACT

One potential disadvantage of social tagging systems is thatdue to the lack of a centralized vocabulary, a crowd of usersmay never manage to reach a consensus on the descriptionof resources (e.g., books, users or songs) on the Web. Yet,previous research has provided interesting evidence that thetag distributions of resources may become semantically sta-ble over time as more and more users tag them. At thesame time, previous work has raised an array of new ques-tions such as: (i) How can we assess the semantic stabilityof social tagging systems in a robust and methodical way?(ii) Does semantic stabilization of tags vary across diﬀer-ent social tagging systems and ultimately, (iii) what are thefactors that can explain semantic stabilization in such sys-tems? In this work we tackle these questions by (i) present-ing a novel and robust method which overcomes a numberof limitations in existing methods, (ii) empirically investi-gating semantic stabilization processes in a wide range ofsocial tagging systems with distinct domains and proper-ties and (iii) detecting potential causes for semantic stabi-lization, speciﬁcally imitation behavior, shared backgroundknowledge and intrinsic properties of natural language. Ourresults show that tagging streams which are generated by a combination of imitation dynamics and shared backgroundknowledge exhibit faster and higher semantic stability thantagging streams which are generated via imitation dynamicsor natural language streams alone.

Categories and Subject Descriptors

H.3.4 [

Information Storage and Retrieval ]: Systemsand Software—

Information Networks

Keywords social tagging; emergent semantics; social semantics; distri-butional semantics; stabilization process

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for proﬁt or commercial advantage and that copiesbear this notice and the full citation on the ﬁrst page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior speciﬁcpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.

1. INTRODUCTION

Instead of enforcing rigid taxonomies or ontologies withcontrolled vocabulary, social tagging systems allow users tofreely choose so-called tags to annotate resources on the Websuch as users, books or videos. A potential disadvantage oftagging systems is that due to the lack of a controlled vo-cabulary, a crowd of users may never manage to reach aconsensus or may never produce a semantically stable de-scription of resources. By semantically stable we mean thatusers have agreed on a set of descriptors and their relativeimportance for a resource which both remain stable overtime. Note, if all descriptors are equally important, usershave not produced a shared and agreed-upon descriptionof a resource, but disagreement may lead to this situationwhere all descriptors have equally low importance.Yet, when we observe real-world social tagging processes,we can identify interesting dynamics from which a seman-tically stable set of descriptors may emerge for a given re-source. This semantic stability has important implicationsfor the collective usefulness of individual tagging behaviorsince it suggests that information organization systems canachieve meaningful resource descriptions and interoperabil-ity across distributed systems in a decentralized manner [23].Semantically stable tagging streams of resources are not onlyessential for attaining meaningful resource interoperabilityacross distributed systems and search, but also for learn-ing lightweight semantic models and ontologies from taggingdata (see e.g., [28, 30, 25]) since ontologies are agreed-uponand shared conceptualizations [15]. Therefore, semantic sta-bility of social tagging streams is a prerequisite for learningontologies from tagging data, since it measures the extentto which users have produced a stable and agreed-upon de-scription of a resource.These observations have sparked a series of research ef-forts focused on (i) methods for assessing semantic stabil-ity in tagging streams (see e.g., [13, 16]), (ii) empirical in-vestigations into the semantic stabilization process and thecognitive processes behind tagging (see e.g., [12, 22]) and(iii) models for simulating the tagging process (see e.g., [5,9]). Figure 1 gives an example of such a real world taggingstream and a corresponding approach to assert semantic sta-bilization [13]. This previous work has proposed to visuallyanalyze the relative tag proportions of the resource beingtagged as a function of consecutive tag assignments. We We deﬁne a (social) tagging stream as a a temporally or-dered sequence of tags that annotate a resource. a r X i v : . [ c s . C Y ] N ov . . . . . Consecutive Tags R e l a t i v e T ag P r opo r t i on (a) Nathan Fillion l . . . . . Consecutive Tags R e l a t i v e T ag P r opo r t i on (b) Sky Sports Figure 1: Relative tag proportions of one heav-ily tagged Twitter user and one moderately taggedTwitter user. One can see that the relative tag pro-portions become stable as more users assign tags tothe two sample users. can assume that the tagging stream of a resource becomesstable if the relative tag proportions stop changing.

Research questions.

While previous work makes a promis-ing case for the existence of semantic stabilization in taggingstreams, it raises more questions that require further atten-tion, including but not limited to the following: (i) Whatexactly is semantic stabilization in the context of social tag-ging streams, and how can we assert it in a robust way?(ii) How suitable are the diﬀerent methods which have beenproposed so far and how do they diﬀer? (iii) Does semanticstabilization of resources vary across diﬀerent social taggingsystems and if yes, in what ways? And ﬁnally, (iv) whatare the factors that may explain the emergence of semanticstability in social tagging streams?

Contributions.

The main contributions of this work arethreefold. We start by making a methodological contribu-tion. Based on a systematic discussion of existing methodsfor asserting semantic stability in social tagging systems weidentify potentials and limitations. We illustrate these ona previously unexplored people tagging dataset and a syn-thetic random tagging dataset. We explore diﬀerent subsam-ples of our dataset including heavily or moderately taggedresources (i.e., a high or moderate amount of users havetagged a resource). Using these insights, we present a noveland ﬂexible method which allows to measure and comparethe semantic stabilization of diﬀerent tagging systems in arobust way. Flexibility is achieved through the provision oftwo meaningful parameters, robustness is demonstrated byapplying it to random control processes.Our second contribution is empirical . We conduct large-scale, empirical analyses of semantic stabilization in a seriesof distinct social tagging systems using our method. Weﬁnd that semantic stabilization of tags varies across diﬀerentsystems, which requires deeper explanations of the dynamicunderlying stabilization processes in social tagging systems.Our ﬁnal contribution is explanatory . We investigate fac-tors which may explain stabilization processes in social tag-ging systems. Our results show that tagging streams whichare generated by a combination of imitation dynamics andshared background knowledge exhibit faster and higher se-mantic stability than tagging streams which are generatedvia imitation dynamics or natural language streams alone.

Structure.

This paper is structured as follows: We startby discussing related work and methods for measuring se-mantic stabilization in tagging systems in Section 2. InSection 3 we highlight that not all state-of-the-art meth- ods are equally suited for measuring semantic stability oftagging systems, and that some important limitations hin-der progress towards a deeper understanding about social-semantic dynamics involved. Based on this discussion, weintroduce the data used for our empirical study in Section 4and present a novel method for assessing semantic stabilityand for exploring the semantic stabilization process in Sec-tion 5. In Section 6 we aim to shed some light on the factorswhich may inﬂuence the stabilization process. We discussand conclude our work in Section 7 and 8.

2. RELATED WORK

Social tagging systems have emerged as an alternative totraditional forms of organizing information which usuallyenforce rigid taxonomies or ontologies with controlled vo-cabulary. Social tagging systems, however, allow users tofreely choose so-called tags to annotate resources such aswebsites, users, books, videos or artists.In past research, it has been suggested that stable pat-terns may emerge when a large group of users annotatesresources on the Web. That means, users seem to reach aconsensus about the description of a resource over time, de-spite the lack of a centralized vocabulary which is a centralelement of traditional forms of organizing information [13,16, 5]. Several methods have been established to measurethis semantic stability: (i) in previous work one co-authorof this paper suggested to assess semantic stability by ana-lyzing the proportions of tags for a given resource as a func-tion of the number of tag assignments [13]. (ii) Halpin etal. [16] proposed a direct method for quantifying stabiliza-tion by using the Kullback-Leibler (KL) divergence betweenthe rank-ordered tag frequency distributions of a resourceat diﬀerent points in time. (iii) Cattuto et al. [5] showedthat power law distributions emerge when looking at rank-ordered tag frequency distributions of a resource which is anindicator of semantic stabilization.Several attempts and hypotheses which aim to explain theobserved stability have emerged. In [13] the authors proposethat the simplest model that results in a power law distri-bution of tags would be the classic Polya Urn model. Theﬁrst model that formalized the notion of new tags was pro-posed by Cattuto et al. [5] by utilizing the Yule-Simon model[33]. Further models like the semantic imitation model [12]or simple imitation mechanisms [22] have been deployed forexplaining and reconstructing real world semantic stabiliza-tion.While above models mainly focus on the imitation behav-ior of users for explaining the stabilization process, sharedbackground knowledge might also be a major factor as oneco-author of this work already hypothesized in previous work[13]. Research by Dellschaft et al. [9] picked up this hypoth-esis and explored the utility of background knowledge as anadditional explanatory factor which may help to simulatethe tagging process. Dellschaft et al. show that combin-ing background knowledge with imitation mechanisms im-proves the simulation results. Although their results are verystrong, their evaluation has certain limitations since they fo-cus on reproducing the sharp drop of the rank-ordered tagfrequency distribution between rank 7 and 10 which waspreviously interpreted as one of the main characteristics oftagging data [3]. However, recent work by Bollen et al. [2]questions that the ﬂatten head of these distributions is acharacteristic which can be attributed to the tagging pro-ess itself. Instead, it may only be an artifact of the userinterface which suggests up to ten tags. Bollen et al. showthat power law forms regardless of whether tag suggestionsare provided to the user or not, making a strong point to-wards the utility of background knowledge for explaining thestabilization.In addition to imitation and background knowledge, analternative and completely diﬀerent explanation for the sta-ble patterns which one can observe in tagging systems exists,namely the regularities and stability of natural language sys-tems. Tagging systems are build on top of natural languageand if all natural language systems stabilize over time, alsotagging streams will stabilize. Zipf’s law [34] states that thefrequency of a word in a corpus is proportional to the inverseof its frequency rank and was found in many diﬀerent nat-ural language corpora (cf. [26]) However, some researcherclaim that Zipf’s law is inevitable and also a randomly gen-erated letter sequence exhibits Zipf’s law [6, 20]. Recentanalysis refuted this claim [8, 11] and further showed thatlanguage networks (based on word co-occurrences) exhibitsmall world eﬀects and scale-free degree distributions [10].While previous work mainly neglected the impact of indi-vidual’s tagging pragmatics, our previous work showed thatnot all users contribute equally to the emergence of tag se-mantics and that “describer” are more useful than “catego-rizer” [19]. Similar to our work [21] also investigates tag dis-tribution on a macro level (i.e., per system) and on a microlevel (i.e., per resource). However, unlike in our work theyuse distribution ﬁtting to explore the stabilization process.

3. STATE-OF-THE-ART METHODS FORASSESSING SEMANTIC STABILIZATION

In the following, we compare and discuss three existingand well-known state-of-the-art methods for measuring sta-bility of tag distributions:

Stable Tag Proportions [13],

Sta-ble Tag Distributions [16] and

Power Law Fits [5]. We deﬁnethe tag distributions as rank-ordered tag frequencies wherethe frequency of a tag depends on how many users have as-signed the tag to a resource. We illustrate the usefulness andlimitations of these methods on a previously unexplored peo-ple tagging dataset and a synthetic random tagging datasetwhich will both be described in Section 4. Each section (i)points out the intuition and deﬁnition of the method, (ii) ap-plies the method to the data, and (iii) describes limitationsand potentials of the method at hand. Intuition and Deﬁnition:

In previous work, Golder andHuberman [13] analyzed the relative proportion of tags as-signed to a given resource (i.e., P ( t | e ) where t is a tag and e isan resource) as a function of the number of tag assignments.In their empirical study on Delicious the authors found astable pattern in which the proportions of tags are nearlyﬁxed after few hundred tag assignments of each website. Demonstration:

In Figure 1 we observe that the tags ofdiﬀerent types of resources (Twitter users rather than web-sites) also give rise to a stable pattern in which the relativeproportions of tags are nearly ﬁxed. This indicates thatalthough users keep creating new tags and assign them to The limitations of the methods are independent of thedataset and we get similar results using the other datasetsintroduced in Section 4. l . . . . . . . . Consecutive Tags R e l a t i v e T ag P r opo r t i on Figure 2: Relative tag proportion of a random tag-ging process where each tag assignment on the x-axiscorresponds to picking one of the ﬁve tags uniformlyat random. One can see that all tag proportions be-come relatively stable over time but are all similar. resources, the proportions of the tags per resource becomestable.

Limitations and Potentials:

In [13] the authors sug-gest that the stability of tag proportions indicates that usershave agreed on a certain vocabulary which describes the re-source. However, also tag distributions produced by a ran-dom tagging process (see Figure 2) become stable as moretag assignments take place since the impact of a constantnumber of tag assignments decreases over time because thetotal sum of the tag frequency vector increases.However, the stable tagging patterns shown in Figure 1 gobeyond what can be explained by a random tagging model,since a random tagging model produces similar proportionsfor all tags (see Figure 2). Hence, small changes in thetag frequency vector are enough to change the order of theranked tags (i.e., the relative importance of the tags for theresource). For real tag distributions this is not the case sincethese tag distributions are distributions with short headsand heavy tails – i.e., few tags are used far more often thanmost others. We exploit this observation for deﬁning ournovel method for assessing semantic stability in Section 5.

Intuition and Deﬁnition:

Halpin et al. [16] present amethod for measuring the semantic stabilization by usingthe Kullback Leibler (KL) divergence between the tag dis-tributions of a resource at diﬀerent points in time. The KLdivergence between two probability distributions Q and P is deﬁned as follows: D KL ( P || Q ) = (cid:88) i P ( x ) ln ( P ( x ) Q ( x ) ) (1)The authors use the rank-ordered tag frequencies of the 25highest ranked unique tags per resource at diﬀerent pointsin time to compute the KL divergence. The authors use onemonth as a time window rather than using a ﬁxed numberof tag assignments as Golder and Huberman [13] did or wedo. This is important since their measure, per deﬁnition,converge towards zero if the number of tag assignments isconstant as we will show later. Demonstration:

We use the rank-ordered tag frequen-cies of the 25 highest ranked tags of each resource and aconstant number ( M ) of consecutive tag assignments. Wecompare the KL divergence of tag distributions after N and N + M consecutive tag assignments. Using a ﬁxed number of

200 400 600 800 1000 . . . . . . Number of consecutive tags assignments K L D i v e r gen c e llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (a) Heavily tagged users . . . . . . Number of consecutive tags assignments K L D i v e r gen c e llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (b) Moderately tagged users Figure 3: KL divergence between the tag distribu-tions at consecutive time points. Each colored linecorresponds to one Twitter user, while the blackdotted line depicts a randomly simulated tag distri-butions. One can see that the KL divergence de-creases as a function of the number of tag assign-ments. The KL divergence of a random taggingprocess decreases slightly slower than the KL diver-gence of the real tagging data. consecutive tag assignments allows exploring the propertiesof a random tag distribution which is generated by drawing M random samples from a uniform multinomial distribution.In Figure 3, each point on the x-axis consists of M = 10consecutive tag assignments and N ranges from 0 to 1000.The black dotted line indicates the KL divergence of a ran-dom tag distribution. One can see from this ﬁgure that notonly the tag distributions of resources seem to converge to-wards zero over time (with few outliers), but also randomtag distributions do. Limitations and Potentials:

A single tag assignment inmonth j has more impact on the shape of the tag distributionof a resource than a single tag added in month j + 1, if weassume the number of tags which are added per month isrelatively stable over time. However, if the number of tagassignments per resource varies a lot across diﬀerent months,convergence can be interpreted as semantic stabilization.This suggests that without knowing the frequencies of tagassignments per month, the measure proposed by Halpinet al. [16] is limited with regard to its usefulness since onenever knows whether stabilization can be observed due tothe fact that users agreed on a certain set of descriptors andtheir relative importance for the resource or due to the factthat the tagging frequency in later months was lower than inearlier months. In our work (see Figure 3), we compare theKL divergence of a randomly generated tag distribution withthe KL divergence of real tag distributions. This revealshow much faster users reach consensus compared to whatone would expect.Even though we believe this method already improves theoriginal approach suggested by Halpin et al. [16], it is stilllimited because it requires to limit the analysis to the top k tags. The KL divergence is only deﬁned between two distri- butions over the same set of tags. We address this limitationwith the new method which we propose in Section 5. Intuition and Deﬁnition:

Tag distributions which fol-low a power law are sometimes regarded as semantically sta-ble, (i) because of the scale invariance property of powerlaw distributions – i.e., that regardless how large the systemgrows, the slope of the distribution would stay the same,and (ii) because power law distributions are heavy tail dis-tributions – i.e., few tags are applied very frequently whilethe majority of tags is hardly used. Adam Mathes [24] orig-inally hypothesized that tag distributions in social taggingsystems follow a power law function. Several studies empir-ically show that the tag distributions of resources in socialtagging systems indeed follow a power law [29, 18, 4, 5]. Apower law distribution is deﬁned by the function: y = cx − α + (cid:15) (2)Both c and α are constants characterizing the power law dis-tribution and (cid:15) represents the uncertainty in the observedvalues. The most important parameter is the scaling param-eter α as it represents the slope of the distribution [2, 7]. Itis also important to remark that real world data nearly neverfollows a power law for the whole range of values. Hence,it is necessary to ﬁnd some minimum value xmin for whichone can say that the tail of the distribution with x ≥ xmin follows a power law [7]. Demonstration:

We ﬁrst visualize the rank frequencytag distributions (see Figure 4(a) and Figure 4(b)) and thecomplementary cumulated distribution function (CCDF) ofthe probability tag distributions (see Figure 4(c) and Fig-ure 4(d)) on a log-log scale. We see that for heavily andmoderately tagged resources, few tags are applied very fre-quently while the vast majority of tags are used very rarely.Figure 4(c) and Figure 4(d) show that the tag distributionsof heavily and moderately tagged resources are dominatedby a large number of tags which are only used once.Figure 4 reveals that the tails of the tag distributions(starting from a tag frequency 2) are close to a straight line.The straight line, which is a main characteristic for powerlaw distributions plotted on a log-log scale, is more visi-ble for heavily tagged resources than for moderately taggedonce. We can now hypothesize that a power law distributioncould be a good ﬁt for our data if we look at the tail of thedistribution with a potential xmin ≥ α we use a maximumlikelihood estimation and for ﬁnding the appropriate xmin value we use the Kolmogorov-Smirnov statistic as suggestedby Clauset et al. [7]. As proposed in previous work [2, 7],we also look at the Kolmogorov-Smirnov distance D of thecorresponding ﬁts – the smaller D the better the ﬁt. Table 1shows the parameters of the best power law ﬁts, averagedover all heavily tagged or moderately tagged resources. Onecan see from this table that the α values are very similar we use the term tail to characterize the end of a distributionin the sense of probability theory Table 1: Parameters of the best power law ﬁts α std xmin std D stdHeavily tagged users 1 . . . . . . . . . . . . Tag rank order by frequency10 T a g f r e q u e n c y (a) Heavily tagged users Tag rank order by frequency10 T a g f r e q u e n c y (b) Moderately tagged users Tag frequency10 -5 -4 -3 -2 -1 p ( X ≥ x ) (c) Heavily tagged users Tag frequency10 -4 -3 -2 -1 p ( X ≥ x ) (d) Moderately tagged users Figure 4: Rank-ordered tag frequency and CCDF plots for heavily tagged and moderately tagged users onlog-log scale. The illustrations show that for both heavily and moderately tagged resources, few tags areapplied very frequently while the vast majority of tags is applied very rarely. In Figure 4(c) and Figure 4(d)we can see that a large number of tags are only used once. The ﬁgures visualizes that the tails of the tagdistributions are close to a straight line which suggests that the distributions might follow a power law. for both datasets and also fall in the typical range of powerlaw distributions. Further, one can see that the power lawﬁts are slightly better for heavily tagged resources than formoderately tagged once, as also suggested by Figure 4.Although our results suggest that it is likely that our dis-tributions have been produced by a power law function, fur-ther investigations are warranted to explore whether otherheavy-tailed candidate distributions are better ﬁts than thepower law [7, 1]. We compare our power law ﬁt to theﬁt of the exponential function , the lognormal function andthe stretched exponential (Weibull) function . We use log-likelihood ratios to indicate which ﬁt is better.The exponential function represents the absolute mini-mal candidate function to describe a heavy-tailed distribu-tion. That means, if the power law function is not a bet-ter ﬁt than the exponential function, it is diﬃcult to assesswhether the distribution is heavy-tailed at all. The lognor-mal and stretched exponential function represent more sen-sible heavy-tailed functions. Clauset et al. [7] point out thatthere are only a few domains where the power law function isa better ﬁt than the lognormal or the stretched exponential.Our results conﬁrm this since we do not ﬁnd signiﬁcantdiﬀerences between the power law ﬁt and the lognormal ﬁt(for both heavily and moderately tagged users). However,most of the time the power law function is signiﬁcantly bet-ter than the stretched exponential function and the powerlaw function is a signiﬁcantly better ﬁt than the exponentialfunction for all heavily tagged users and for most moder-ately tagged users. This indicates that the tag distributionsof heavily tagged resources and most moderately tagged re-sources are clearly heavy tail distributions and the powerlaw function is a reasonable well explanation. However, itremains unclear from which heavy tail distribution the datahas been drawn since several of them produce good ﬁts.

Limitations and Potentials:

As we have shown, onelimitation of this method is that it is often diﬃcult to deter-mine which distribution has generated the data since sev-eral distributions with similar characteristics may producean equally good ﬁt. Furthermore, the automatic calculationof the best xmin value for the power law ﬁt has certain con-sequences since xmin might become very large and thereforethe tail to which the power law function is ﬁtted may be-come very short. Finally, there is still an ongoing discussionabout the informativeness of scaling laws (see [17] for a goodoverview), since some previous work suggests that there ex-ist many ways to produce scaling laws and some of thoseways are idiosyncratic and artifactual [27, 20].

4. EXPERIMENTAL SETUP AND DATASETS

We conduct large-scale, empirical analyses on the seman-tic stabilization process in a series of diﬀerent social taggingsystems using the state-of-the-art methods described in Sec-tion 3 and using a new method introduced in Section 5.Table 2 gives an overview of the datasets obtained from dis-tinct tagging systems using the nature of the resource beingtagged, the sequential order of the tagging process (i.e., isthe resource selected ﬁrst or the tag), the existence or ab-sence of tag suggestions and the visibility of the tags whichhave been previously assigned to a resource. We say thattags have a low visibility if users do not see them duringthe tagging process and if they are not shown on the pageof the resource being tagged. Otherwise, tags have a highvisibility. Further, the number of resources, users and tagsper dataset are shown.

Delicious dataset:

Delicious is a social tagging systemwhere users can tag any type of website. We use the De-licious dataset crawled by G¨orlitz et al. [14]. From thisdataset we randomly selected 100 websites which were taggedby many users (more than 4k users) and 100 websites whichwere moderately tagged (i.e., by less than 4k but more than1k users) and explore the consecutive tag assignments foreach website. The original dataset is available online . LibraryThing dataset:

LibraryThing is a social taggingsystem which allows to tag books. We use the LibraryThingdataset which was crawled by Zubiaga et al. [35]. Again,we randomly sampled 100 books that were heavily tagged(more than 2k users) and 100 books which were moderatelytagged (less than 2k and more than 1k users) and explorethe consecutive tag assignments for each book.

Twitter dataset:

Twitter is a microblogging service thatallows users to tag their contacts by grouping them into userlists with a descriptive title. The creation of such list titlescan be understood as a form of tagging since list titles arefree form words which are associated with one or several re-sources (in this case users). What is unique about this formof tagging is that the tag (aka the list title) is usually pro-duced ﬁrst, and then users are added to this list, whereas inmore traditional tagging systems such as Delicious, the pro-cess is the other way around. From a Twitter dataset whichwe described in previous work [31], we selected a sample of100 heavily tagged users (which are mentioned in more than10k lists) and 100 moderately tagged users (which are men- ioned in less than 10k lists and more than 1k lists). For eachof these sample users we crawled the full history of lists towhich a user was assigned. We do not know the exact timewhen a user was assigned to a list but we know the relativeorder in which a user was assigned to diﬀerent lists. There-fore, we can study the tagging process over time by usingconsecutive list assignments as a sequential ordering .It needs to be noted that the thresholds we have usedabove during the data collection are distinct for each taggingsystem since those systems diﬀer amongst others in theirnumber of active users and size. We chose the thresholdsempirically and found that the choice of threshold does notimpact our results since heavily tagged as well as moderatelytagged resources show similar characteristics.Finally, we also contrast our tagging datasets with a natu-ral language corpus (see Section 6.2) and a random taggingdataset. This allows us on one hand, to explore to whatextent semantic stabilization which can be observed in tag-ging systems goes beyond what one would expect to observeif the tagging process would be a random process; and onthe other hand, to compare the semantic stabilization of thetag distributions of resources with the semantic stabilizationof co-occurring word distributions of resources. Natural Language corpus:

As a natural language cor-pus we use a sample of tweets which refer to the same re-source. Therefore, we selected a random sample of usersfrom our Twitter dataset which have received tweets frommany distinct users (more than 1k). For those users, we se-lect a sample of up to 10k tweets they received. The wordsin those tweets are extracted and interpreted as social anno-tations of the receiver. This allows us to compare tags withwords, both annotating a resource (in this case a user).

Synthetic random tagging dataset:

Given a ﬁxed vo-cabulary size we can create a random tagging dataset bysimulating the tagging process as random draws from a urn(containing all possible tags of the vocabulary) where eachball (i.e., tag) is returned to the urn after each draw.

5. MEASURING SEMANTIC STABILITY

Based on the analysis of state-of-the-art methods pre-sented in Section 3, we (i) present a novel method for as-sessing the semantic stability of individual tagging streamsand (ii) show how this method can be used to assess andcompare the stabilization process in diﬀerent tagging sys-tems. Our new method incorporates three new ideas:

Ranking of tags:

A tagging stream can be consideredas semantically stable if users have agreed on a ranking oftags which remains stable over time. It is more importantthat the ranking of frequent tags remains stable than theranking of less frequent tags since frequent tags are those We share the Twitter user handles to allow other re-searchers to recreate our dataset and reproduce our resultsfor our heavily tagged http://claudiawagner.info/data/gr_10k_username.csv and moderately tagged http://claudiawagner.info/data/less_10k_username.csv

Twit-ter users. which might be more relevant for a resource. Frequent tagshave been applied by many users to a resource and thereforestable rankings of these tags indicate that a large group ofusers has agreed on the relative importance of the tags forthat resource.

Random baselines:

Semantic stability of a random tag-ging process needs to be considered as a baseline for stabilitysince we are interested in exploring stable patterns which gobeyond what can be explained by a random tagging process.

New tags over time:

New tags can be added over timeand therefore, a method which compares the tag distribu-tions of one resource at diﬀerent points in time must be ableto handle mutually non-conjoint tag distributions – i.e., dis-tributions which contain tags that turn up in one distribu-tion but not in the other one. Most measures used in previ-ous work (e.g., the KL divergence) only allow to compare theagreement between mutually conjoint lists of elements anda common practice is to prune tag distributions to their topk elements – i.e., the most frequently used tags per resource.However, this pruning requires global knowledge about thetag usage and only enables a post-hoc rather than a real-time analysis of semantic stability.

RBO ( σ , σ , p ) Intuition and Deﬁnition:

The Rank Biased Overlap(RBO) [32] measures the similarity between two rankingsand is based on the cumulative set overlap. The set overlapat each rank is weighted by a geometric sequence, providingboth top-weightedness and convergence. RBO is deﬁned asfollows:

RBO ( σ , σ , p ) = (1 − p ) ∞ (cid:88) d =1 σ d ∩ σ d d p ( d − (3)Let σ σ σ d and σ d be the ranked lists at depth d . TheRBO falls in the range [0 , p (0 ≤ p <

1) determineshow steep the decline in weights is. The smaller p is, themore top-weighted the metric is. If p = 0, only the top-ranked item of each list is considered and the RBO score iseither zero or one. On the other hand, as p approaches arbi-trarily close to 1, the weights become arbitrarily ﬂat. Theseweights, however, are not the same as the weights that theelements at diﬀerent ranks d themselves take, since theseelements contribute to multiple agreements.In the following, we use a version of RBO that accountsfor tied ranks. As suggested in [32], ties are handled byassuming that if t items are tied for ranks d to d +( t − d . RBO may account for ties by dividingtwice the overlap at depth d by the number of items whichoccur at depth d, rather than the depth itself: RBO ( σ , σ , p ) = (1 − p ) ∞ (cid:88) d =1 ∗ σ d ∩ σ d | σ d + σ d | p ( d − (4)We modify RBO by summing only over occurring depthsrather than all possible depths. Therefore, our RBO mea- Table 2: Description of the datasets and characteristics of the social tagging system the data stem from.System Entity Type Tag First Tag Suggestions Tags Visible

Delicious websites no yes low 17,000k 532k 2,400kLibraryThing books no no high 3,500k 150k 2,000kTwitter lists users yes no low 3,286 2,290k 1,111k . . . . . . Number of consecutive tags assignments R B O lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (a) Heavily tagged users . . . . . . Number of consecutive tags assignments R B O lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (b) Moderately tagged users Figure 5: Rank Biased Overlap (RBO) measureswith p = 0 . . The black dotted line shows theweighted average RBO of a random tagging processover time, while each colored line corresponds to theRBO of one Twitter user. sure further penalizes ties and assigns a lower RBO valueto pairs of lists containing ties. For example, consider thefollowing two pairs of ranked lists of items: (i) (A=1, B=2,C=3, D=4) , (A=3, B=2, C=1, D=4) and (ii) (A=1, B=1,C=1, D=4) , (A=1, B=1, C=1, D=4) . Both pairs of listshave the same concordant pairs: (A,D) and (B,D) and (C,D).The RBO value of the ﬁrst pair is 0 . .

34 according to theoriginal measure and 0 .

17 according to our tie-corrected vari-ant. This example nicely shows that while the original RBOmeasure tends to overestimate ties, our variant slightly pe-nalizes ties. For our use case this makes sense since we donot want to overestimate the semantic stability of a resourcewhere users have not agreed on a ranking of tags but onlyﬁnd that all of tags are equally important.

Demonstration:

Figure 5 shows the RBO of the tagdistributions of resources over time for our people taggingdataset. The RBO value between the tag distribution after N and N + M tag assignments is high if the M new tag as-signments do not change the ranking of the (top-weighted)tags. One can see from Figure 5 that the RBO of a randomlygenerated tag distribution is pretty low and increases slowlyas more and more tags are added over time. On the contrary,the RBO of real tag distributions increases as more and moretags are added. At the beginning, it increases quickly andremains relative stable after few thousand tag assignments.This indicates that the RBO measure allows identifying aconsensus in the tag distributions which may emerge overtime and which goes beyond what one would expect from arandom tagging process . A random tagging process producesrelative tag proportions which are all very similar (i.e., alltags are equally important or unimportant). Therefore, theprobability that the ranking changes after new tag assign-ments is higher than it is for real tagging streams where usershave produced a clear ranking of tags where some tags aremuch more important than others. Over time, the gap be-tween real tagging streams and random tagging streams willdecrease. However, one can see that within the time-windowin which real tagging streams semantically stabilize (i.e., fewthousand tag assignments) tag distributions produced by arandom process are signiﬁcantly less stable. Again, we cansee that the tag distributions of heavily tagged resourcesare slightly more stable than those of moderately taggedresources. s e m a n t i c s t a b ili t y k P . . . .

45% of resources are stable60% of resources are stable75% of resources are stable90% of resources are stable

Figure 6: The percentage of resources (in this caseheavily tagged Twitter users) stabilized at time t with stability threshold k . For example, point Pindicates that after 1250 tag assignments 90% of re-sources exhibit semantic stability (an RBO value) of . or higher. In our work, we empirically chose p = 0 . p . For example, when choosing p = 0 .

98 the ﬁrst 50 itemsget 86% of the weight. If one would chose a lower value for psuch as p = 0 . p = 0 .

5) the ﬁrst two element would get99.6% (or 88.6%) of the weight. That means, all elementswith a rank lower than two would be almost ignored andtherefore the RBO values show more ﬂuctuation. However,in all our experiments with diﬀerent p values the RBO ofreal tag distributions was signiﬁcantly higher than the RBOof random tag distributions. Limitations and Potentials:

One advantage of RBO isthat it handles mutually non-conjoint lists of tags, weightshighly ranked tags more heavily than lower ranked tags, andis monotonic with increasing depth of evaluation. A poten-tial limitation of RBO is that it requires to pick the param-eter p which deﬁnes the decline in weights - i.e., how top-weighted the RBO measure is. Which level of top-weightnessis appropriate for the tag distributions in diﬀerent taggingsystems might be a controversial question. However, ourexperiments revealed that as long as the parameter p wasnot chosen to be small (i.e., p < . Based on the previously deﬁned

Rank Biased Overlap wepropose a method which allows to investigate the semanticstabilization process in a social tagging system based on thestabilization process of the individual resources which aretagged. Furthermore, this method also allows to comparethe extent to which diﬀerent systems have become stable.Given a sample of tagged resources (the sample size N andthe type of resources can be chosen arbitrarily) the goal isto specify how many resources of the sample have stabilizedafter a certain number of consecutive tag assignments. Wepropose a ﬂexible and ﬂuid deﬁnition of the concept of sta-bilization by introducing (a) a parameter k that constitutesa threshold for the RBO value and (b) a parameter t thatspeciﬁes the number of consecutive tag assignments. Wecall a resource in a social tagging system semantically stableat point t , if the RBO value between its tag distribution atpoint t − t is equal or greater than k . Our proposedmethod allows to calculate the percentage of resources thathave semantically stabilized after a number of consecutivetag assignments t according to some threshold for stabiliza-

500 1000 1500 2000 2500 3000 3500 4000nr. of consecutive tag assignments t0.00.20.40.60.81.0 s e m a n t i c s t a b ili t y k . . . . . . . . . . . . . . . . . . . . Delicious datasetLibrarything datasetTwitter datasetNatural language corpusRandom baseline

Figure 7: Semantic stabilization of diﬀerent social tagging datasets, a natural language corpus and a syntheticrandom tagging dataset as a control. The x axis represents the consecutive tag assignments t while the y-axisdepicts the RBO (with p = 0 . ) threshold k . The contour lines illustrate the curve for which the function f ( t, k ) has constant values. These values are depicted in the lines and represent the percentage of stabilization f . On can see that tagging streams in Delicious and LibraryThing stabilize faster and reach higher levels ofsemantic stability than other datasets. tion k . We can deﬁne this function by: f ( t, k ) = 1 N N (cid:88) i =1 (cid:40) , if RBO ( σ i t − , σ i t , p ) > k. , otherwise . (5)We illustrate the semantic stabilization for our sampleof heavily tagged Twitter users in Figure 6. The contourplot depicts the percentage of resources (i.e., Twitter users)which have become semantically stable according to someRBO threshold k after t tag assignments. The ﬁgure showsthat after 1k tag assignments 90% of Twitter users have anRBO value above 0 . . . . . In this section we use our novel method to compare thesemantic stabilization process of diﬀerent social tagging sys-tems introduced in Section 4.The contour plot in Figure 7 depicts the percentage of re-sources which have become semantically stable according tosome RBO threshold k after t tag assignments in diﬀerentsocial tagging systems. First of all, we can see that the ran-dom dataset exhibits by far the lowest stabilization since theresources just stabilize for low k ( k < .

2) even after a largeamount of tag assignments t . That means, the k thresholdfor which 90%, 75%, 60% and 45% of all resources have anequal or higher RBO values than k is very low. Contrary,we can see that real-world tagging systems exhibit muchhigher stability. The highest (i.e., high k values) and fastest(i.e., low t values) overall tag stabilization can be observedfor Delicious and LibraryThing which both encourage imi-tation behavior by suggesting previously assigned tags (seeDelicious) and by making previously assigned tags visibleduring the tagging process (see LibraryThing).In Twitter users ﬁrst have to create a tag (aka user list)and afterwards select the resources (aka users) to which they want to assign the tag. During this tagging process, tagswhich have been previously assigned to users are not visi-ble and therefore it is unlikely that imitation behavior playsa major role in Twitter . Interestingly, our results showthat despite the diﬀerence in the user interfaces, the peo-ple tagging streams in Twitter exhibit similar stabilizationpatterns as the book and website tagging streams in Deli-cious and LibraryThing. However, people tagging streamsin Twitter stabilize slightly slower and less heavily than thetagging streams in Delicious and LibraryThing where imi-tation behavior is encouraged. This result is striking sinceit suggests that imitation cannot be the only factor whichcauses the stable patterns which arise when a large group ofusers tag a resource . Our empirical results from diﬀerentsocial tagging systems are in line the results from the userstudy presented in [2] which also shows that tag distribu-tions of resources become stable regardless of the visibilityof previously assigned tags. The presence of tag suggestionsmay provoke a higher and faster agreement between userswho tag a resource and may therefore lead to higher levels ofstability, but it is clearly not the only factor causing stability.Our results suggest that in tagging system which encourageimitation less than 1k tag assignments are necessary beforea tagging stream becomes semantically stable (i.e., the rankagreement has reached a certain level and does not changeanymore), while in tagging systems which do not encourageimitation more than 1k tag assignments are required.

6. EXPLAINING SEMANTIC STABILITY

The experimental results reported in [2] as well as our ownempirical results on the people tagging dataset from Twitter If users want to see which other tags have previously beenassigned to a user they need to visit her proﬁle page andnavigate to the list membership section. Since this is fairlytime intensive one can speculate that it is unlikely that usersimitate the previously assigned tags but create their owntags and assign users to them based on what they knowabout them and how they want to organize them.

500 1000 1500 2000 2500 3000 3500 4000nr. of consecutive tag assignments t0.00.20.40.60.81.0 s e m a n t i c s t a b ili t y k . . . . . . . . . . . . I=0.0 (BK only)I=0.3 (some BK & little imitation)I=0.7 (little BK & some imitation)I=1.0 (imitation only)

Figure 8: Semantic stabilization of synthetic (i.e. simulated) tagging processes. Tagging streams which aregenerated by a combination of imitation dynamics (70%) and background knowledge (30%) tend to stabilizefaster and reach higher levels of stability than streams which are generated by imitation behavior (I=1) orbackground knowledge (I=0) alone. suggest that stable patterns may also arise in the absenceof imitation behavior. As a consequence, other factors thatmight explain semantic stabilization, such as shared back-ground knowledge and stable properties of natural language,deserve further investigation.

To explore the potential impact of imitation and sharedbackground knowledge we simulate the tag choice process.According to [9] there are several plausible ways how the tagchoice process can be modeled:

Random tag choice:

Each tag is chosen with the sameprobability. This corresponds to users who randomly choosetags from the set of all available tags which seems to be onlya plausible strategy for spammers

Imitation:

The tags are chosen with a probability thatis proportional to the tag’s occurrence probability in theprevious stream. This selection strategy corresponds to thePolya Urn model described in [13] where only tags that havebeen used before are in the urn and can be selected. Thiscorresponds to users who are easily inﬂuenced by other users.

Background Knowledge:

The tags are chosen with aprobability that is proportional to the tag’s probability inthe shared background knowledge of users. This correspondsto users who choose tags that seem appropriate based ontheir own background knowledge.In our simulation, we assume that the tag choice of usersmight be driven by both imitation and background knowl-edge. Similar to the epistemic model [9] we introduce a pa-rameter I describing the impact of imitation. Consequently,the impact of shared background knowledge is 1 − I . We run I from 0 to 1 – i.e., we simulate tagging streams which havebeen generated by users who only use the imitation strategyto choose their tags ( I = 1), users who only rely on theirbackground knowledge when selecting tags ( I = 0), andusers who adapt both strategies. We use a word-frequencycorpus from Wikipedia to simulate the shared background knowledge. For each synthetic dataset we simulate 100 tag-ging streams in order to have the same sample size as forour real-world datasets introduced in Section 4.Our results in Figure 8 show the percentage of resourceswhich have a RBO value equal or higher than k after t tagassignments for diﬀerent synthetic tagging datasets. Onecan see from this ﬁgure that a synthetic tagging datasetwith I = 1 (i.e., a datasets which was solely created viaimitation behavior) does not stabilize over time since morethan 90% of the resources have very low RBO values (i.e., k < .

1) also after a few thousand tag assignments. This isconsistent with our intuition since a model which is purelybased on imitation dynamics fails to introduce new tags andtherefore no ranked lists of tags per resource can be created.Further, one can see that a synthetic tagging dataset with I = 0 (i.e., a tagging datasets which was solely createdvia background knowledge and therefore reﬂects the proper-ties of a natural language system) stabilizes slightly slowerthan a synthetic tagging dataset which was generated by amixture of background knowledge and imitation dynamics( I = 0 . when shared background knowledge (encoded in naturallanguage) is combined with social imitation, tagging streamsreach higher levels of semantic stability ( . < k < . )quicker (for lower t ) than if users either only rely on imita-tion behavior or on background knowledge . Our ﬁndings arein line with previous research [9] which showed that an imi-tation rate between 60% and 90% is best for simulating realtag streams of resources. However, as described in Section 2their work has certain limitations which we address by (i)exploring a range of diﬀerent social tagging systems includ-ing one where no tags are suggested and previously assignedtags are not visible during the tagging process and (ii) study-ing the semantic stabilization process over time rather thanthe shape of the rank-ordered tag frequency distribution ata single time point. Since tagging systems are natural language systems, theregularities and the stability of natural language (see e.g.,34] and [10]) may cause the stable patterns which we ob-serve in tagging systems. That means, one can argue thattagging systems become stable because they are built on topof natural language which itself is stable.Our results presented in Figure 7 show that a naturallanguage corpus (see Section 4) – where users talk abouta set of sample resources – also becomes semantically sta-ble over time and reaches a medium level of stability (with k > . t > , I = 0 .

0) and therefore reﬂectingthe properties of the natural language, becomes semanti-cally stable over time and reaches a medium level of stabil-ity (with k > . t > , k . The onlytagging stream dataset which shows a similar stabilizationprocess as the natural language dataset is the people tag-ging dataset obtained from Twitter which does not supportany imitation mechanisms. This suggest, that the stabil-ity of natural language systems can indeed explain a largeproportion of the stability which can be observed in taggingsystems where the tagging process is not really social (i.e.,each user annotates a resource separately without seeing thetags others used) and no imitation dynamics are supported.However, tagging systems which support the social aspectof tagging by e.g., showing tags which have been previouslyapplied by others, exhibit a faster and higher level of se-mantic stabilization than tagging systems which do not im-plement these social functionalities. This suggests that thesemantic stability which can be observed in social taggingsystems goes beyond what one would expect from naturallanguage systems and that higher and faster degree of sta-bility is achieved through the social dynamics in taggingsystems; concretely, the imitation behavior of users.

7. DISCUSSION

The main implications of our work are: (i) We highlightlimitations of existing methods for measuring semantic sta-bility in social tagging streams and introduce a new andmore robust method. However, our method is not limitedto social tagging systems and tagging streams and can beused to measure stability and user agreement in other typesof data streams which are collectively created by a set ofusers (e.g., hashtag-streams in Twitter or Wikipedia edit-streams). (ii) Our simulation results suggest that whenaiming to improve semantic stability of social tagging sys-tems, system designers can exploit the insights gained fromour work by implementing mechanisms which - for exam-ple - augment imitation in 70% of cases (e.g., by suggestingor showing previously assigned tags) while tapping into thebackground knowledge of users in 30% of cases (e.g., by re-quiring users to tag without recommendation mechanismsat place, thereby utilizing background knowledge). In future we also want to explore the lowest number ofusers that need to tag a resource in order to produce a stabletag description of the resource for which we would also needto model the number of tags users simultaneously assign toresources into our experiments. Further, we want to pointout that for the sake of simplicity we used the same back-ground knowledge corpus for all resources and neglected theimpact of the user interface (i.e., the number of suggestedtags and the number of previously used tags from whichthey are chosen) on the imitation process. These user in-terface parameters are diﬀerent for distinct tagging systemsand have been varied over time. Without exactly knowinghow the user interface looked like during the tagging processand how the algorithm for suggesting and displaying tagsworked, it is diﬃcult to properly choose these parameters.

8. CONCLUSIONS

Based on an in-depth analysis of existing methods, wehave presented a novel method for assessing semantic sta-bilization processes. We have applied our method to diﬀer-ent social tagging systems empirically, and to diﬀerent syn-thetic tagging streams via simulations. Our results revealthat semantic stability in tagging systems cannot solely beexplained by imitation behavior of users, rather a combina-tion of imitation and background knowledge exhibits high-est semantic stabilization. Summarizing, our work makescontributions on three diﬀerent levels.

Methodological : Based on systematic investigations weidentify potentials and limitations of existing methods forasserting semantic stability in social tagging systems. Us-ing these insights, we present a novel, yet ﬂexible, methodwhich allows to measure and compare the semantic stabiliza-tion of diﬀerent tagging systems in a robust way. Flexibilityis achieved through the provision of two meaningful param-eters, robustness is demonstrated by applying it to randomcontrol processes. Our method is general enough to be ap-plicable beyond social tagging systems, e.g., to streams ofhashtags on Twitter.

Empirical : We conduct large-scale empirical analyses ofsemantic stabilization in a series of distinct social taggingsystems using our method. We ﬁnd that semantic stabiliza-tion of tags varies across diverse systems that adopt diﬀer-ent tagging mechanics, which requires deeper explanationsof the dynamics of underlying stabilization processes.

Explanatory : We investigate factors which may explainstabilization processes in social tagging systems using sim-ulations. Our results show that tagging streams which aregenerated by a combination of imitation dynamics and sharedbackground knowledge exhibit faster and higher semanticstability than tagging streams which are generated via imi-tation dynamics or natural language phenomena alone.Our ﬁndings are relevant for researchers interested in de-veloping more sophisticated methods for assessing semanticstability of tagging streams and for practitioners interestedin assessing the extent of semantic stabilization in social tag-ging systems on a system scale.

Acknowledgments.

We thank Dr. William Webber forassistance with his RBO metric and Dr. Harry Halpin forassistance with his semantic stability measure. This work isin part funded by the FWF Austrian Science Fund GrantI677. Claudia Wagner is a recipient of a DOC-fForte fellow-ship of the Austrian Academy of Science. . REFERENCES [1] J. Alstott, E. Bullmore, and D. Plenz. powerlaw: apython package for analysis of heavy-taileddistributions. 2013.[2] D. Bollen and H. Halpin. The role of tag suggestionsin folksonomies. In

Proceedings of the 20th ACMconference on Hypertext and hypermedia , HT ’09,pages 359–360, New York, NY, USA, 2009. ACM.[3] C. Cattuto, V. Loreto, and L. Pietronero. Semioticdynamics in online social communities. In

In TheEuropean Physical Journal C , pages 33–37.Springer-Verlag, 2006.[4] C. Cattuto, V. Loreto, and L. Pietronero. Semioticdynamics in online social communities. In

In TheEuropean Physical Journal C (accepted , pages 33–37.Springer-Verlag, 2006.[5] C. Cattuto, V. Loreto, and L. Pietronero. Semioticdynamics and collaborative tagging.

Proceedings of theNational Academy of Sciences , 104(5):1461–1464,2007.[6] N. Chomsky and G. Miller. Finitary Models ofLanguage Users. In Luce, Bush, and Galanter, editors,

Handbook of Mathematical Psychology 2 , volumeHandbook of Mathematical Psychology 2, pages419–491, New York, New York, 1963. Wiley and Sons.[7] A. Clauset, C. R. Shalizi, and M. E. J. Newman.Power-law distributions in empirical data.

SIAM Rev. ,51(4):661–703, Nov. 2009.[8] A. Cohen, R. N. Mantegna, and S. Havlin. Numericalanalysis of word frequencies in artiﬁcial and naturallanguage texts.

Fractals , 1997.[9] K. Dellschaft and S. Staab. An epistemic dynamicmodel for tagging systems. In

HT ’08: Proceedings ofthe nineteenth ACM conference on Hypertext andhypermedia , pages 71–80, New York, NY, USA, 2008.ACM.[10] R. Ferrer, Cancho, and R. V. Sol´e. The small world ofhuman language.

Proceedings of The Royal Society ofLondon. Series B, Biological Sciences , 268:2261–2266,2001.[11] R. Ferrer-i Cancho and B. Elvev˚ag. Random Texts DoNot Exhibit the Real Zipf’s Law-Like RankDistribution.

PLoS ONE , 5(3):e9411+, Mar. 2010.[12] W.-T. Fu, T. Kannampallil, R. Kang, and J. He.Semantic imitation in social tagging.

ACM Trans.Comput.-Hum. Interact. , 17(3):12:1–12:37, July 2010.[13] S. Golder and B. A. Huberman. Usage patterns ofcollaborative tagging systems.

Journal of InformationScience , 32(2):198–208, April 2006.[14] O. G¨orlitz, S. Sizov, and S. Staab. Pints: peer-to-peerinfrastructure for tagging systems. In

Proceedings ofthe 7th international conference on Peer-to-peersystems , IPTPS’08, pages 19–19, Berkeley, CA, USA,2008. USENIX Association.[15] T. R. Gruber. Toward principles for the design ofontologies used for knowledge sharing.

Int. J.Hum.-Comput. Stud. , 43(5-6):907–928, Dec. 1995.[16] H. Halpin, V. Robu, and H. Shepherd. The complexdynamics of collaborative tagging. In

Proceedings ofthe 16th international conference on World Wide Web ,WWW ’07, pages 211–220, New York, NY, USA,2007. ACM.[17] C. T. Kello, G. D. A. Brown, R. Ferrer-i Cancho, J. G.Holden, K. Linkenkaer-Hansen, T. Rhodes, and G. C.Van Orden. Scaling laws in cognitive sciences.

Trendsin Cognitive Sciences , 14(5):223–232, May 2010.[18] M. E. I. Kipp and G. D. Campbell. Patterns andinconsistencies in collaborative tagging systems : Anexamination of tagging practices. Nov. 2006.[19] C. K¨orner, D. Benz, A. Hotho, M. Strohmaier, andG. Stumme. Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In

Proceedings of the 19th international conference onWorld wide web , WWW ’10, pages 521–530, NewYork, NY, USA, 2010. ACM.[20] W. Li. Random texts exhibit zipf’s-law-like wordfrequency distribution.

IEEE Transactions onInformation Theory , pages 1842–1845, 1992.[21] N. Lin, D. Li, Y. Ding, B. He, Z. Qin, J. Tang, J. Li,and T. Dong. The dynamic features of delicious, ﬂickr,and youtube.

J. Am. Soc. Inf. Sci. Technol. ,63(1):139–162, Jan. 2012.[22] J. Lorince and P. M. Todd. Can simple social copyingheuristics explain tag popularity in a collaborativetagging system? In

Proceedings of the 5th AnnualACM Web Science Conference , WebSci ’13, pages215–224, New York, NY, USA, 2013. ACM.[23] G. Macgregor and E. McCulloch. Collaborativetagging as a knowledge organisation and resourcediscovery tool.

Library Review , 55(5), in press.[24] A. Mathes. Folksonomies: Cooperative classiﬁcationand communication through shared metadata. , June2004. Accessed: 2013-07-11.[25] P. Mika. Ontologies are us: A uniﬁed model of socialnetworks and semantics.

Web Semant. , 5(1):5–15,Mar. 2007.[26] M. A. Montemurro and D. Zanette. Frequency-rankdistribution of words in large text samples:phenomenology and models.

Glottometrics , 4:87–99,2002.[27] A. Rapoport.

Zipf’s law revisited . StudienverlagBockmeyer, 1982.[28] C. Schmitz, A. Hotho, R. J¨aschke, and G. Stumme.Mining association rules in folksonomies. In

DATASCIENCE AND CLASSIFICATION: PROC. OFTHE 10TH IFCS CONF., STUDIES INCLASSIFICATION, DATA ANALYSIS, ANDKNOWLEDGE ORGANIZATION , pages 261–270.Springer, 2006.[29] S. Sen, S. K. Lam, A. M. Rashid, D. Cosley,D. Frankowski, J. Osterhouse, F. M. Harper, andJ. Riedl. tagging, communities, vocabulary, evolution.In

Proceedings of the 2006 20th anniversary conferenceon Computer supported cooperative work , CSCW ’06,pages 181–190, New York, NY, USA, 2006. ACM.[30] L. Specia and E. Motta. Integrating folksonomies withthe semantic web. In

Proceedings of the 4th Europeanconference on The Semantic Web: Research andApplications , ESWC ’07, pages 624–639, Berlin,Heidelberg, 2007. Springer-Verlag.[31] C. Wagner, S. Asur, and J. Hailpern. Religiouspoliticians and creative photographers: Automaticuser categorization in twitter. In

ASE/IEEEInternational Conference on Social Computing(SocialCom2013) , 2013.[32] W. Webber, A. Moﬀat, and J. Zobel. A similaritymeasure for indeﬁnite rankings.

ACM Trans. Inf.Syst. , 28(4):20:1–20:38, Nov. 2010.[33] G. U. Yule. A Mathematical Theory of Evolution,Based on the Conclusions of Dr. J. C. Willis, F.R.S.213(402-410):21–87, Jan. 1925.[34] G. K. Zipf.

Human behavior and the principle of leasteﬀort . Addison-Wesley Press, 1949.[35] A. Zubiaga, C. K¨orner, and M. Strohmaier. Tags vsshelves: from social tagging to social classiﬁcation. In