Signed Networks in Social Media
aa r X i v : . [ phy s i c s . s o c - ph ] M a r Signed Networks in Social Media
Jure Leskovec
Stanford [email protected]
Daniel Huttenlocher
Cornell [email protected]
Jon Kleinberg
Cornell [email protected]
ABSTRACT
Relations between users on social media sites often reflecta mixture of positive (friendly) and negative (antagonistic)interactions. In contrast to the bulk of research on social net-works that has focused almost exclusively on positive inter-pretations of links between people, we study how the inter-play between positive and negative relationships affects thestructure of on-line social networks. We connect our anal-yses to theories of signed networks from social psychology.We find that the classical theory of structural balance tendsto capture certain common patterns of interaction, but that itis also at odds with some of the fundamental phenomena weobserve — particularly related to the evolving, directed na-ture of these on-line networks. We then develop an alternatetheory of status that better explains the observed edge signsand provides insights into the underlying social mechanisms.Our work provides one of the first large-scale evaluations oftheories of signed networks using on-line datasets, as wellas providing a perspective for reasoning about social mediasites.
Author Keywords signed networks, structural balance, status theory, positiveedges, negative edges, trust, distrust.
ACM Classification Keywords
H.5.3 Information Systems: Group and Organization Inter-faces—
Web-based interaction . General Terms
Human Factors, Measurement, Design.
INTRODUCTION
Social network analysis provides a useful perspective on arange of social computing applications. The structure of net-works arising in such applications offers insights into pat-terns of interactions, and reveals global phenomena at scalesthat may be hard to identify when looking at a finer-grainedresolution. At the same time, there is an ongoing challengein adapting such network approaches to the study of socialcomputing: users develop rich relationships with one an-other in these settings, while network analyses generally re-
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.
CHI 2010 , April 10 – 15, 2010, Atlanta, Georgia, USACopyright 2010 ACM 978-1-60558-929-9/10/04... $ duce these complex relationship to the existence of simplepairwise links. It is a fundamental research problem to bridgethe gap between the richness of the existing relationships andthe stylized nature of network representations of these rela-tionships.The main focus of our work here is to examine the inter-play between positive and negative links in social media —a dimension of on-line social network analysis that has beenlargely unexplored. With relatively few exceptions (e.g., [1,15, 16]), research in on-line social networks has focused oncontexts in which the interactions have largely only positiveinterpretations — that is, connecting people to their friends,fans, followers, and collaborators. But in many settings it isimportant to also explicitly take negative relations into con-sideration, especially when studying interactions in socialmedia: discussion lists are filled with controversy and dis-agreement, and social-networking sites harbor antagonismalongside amity. The richness of a social network in suchcases generally consists of a mixture of both positive andnegative interactions, co-existing in a single structure.We aim to develop a better understanding of the role that net-work structure plays when some links between people arepositive while others are negative. For instance, in on-linerating sites such as Epinions, people can give both positiveand negative ratings not only to items but also to other raters.In on-line discussion sites such as Slashdot, users can tagother users as “friends” and “foes”. Our approach here isto adapt and extend theories from social psychology to an-alyze these types of signed networks as they arise in socialcomputing applications. These theories enable us to char-acterize the differences between the observed and predictedconfigurations of positive and negative links in on-line so-cial networks. We also use contrasts between the theories todraw inferences about how links are being used in particularsocial computing applications. In addition to insights intothe applications themselves, our studies provide, to the bestof our knowledge, some of the first large-scale evaluationsof these social-psychological theories via on-line datasets. Positive and negative links in on-line data.
To carry outsuch an investigation, we need two fundamental ingredients:(i) large-scale datasets from social applications where the sign of each link — whether it is positive or negative — canbe reliably determined, and (ii) theories of signed networksthat help us reason about how different patterns of positiveand negative links provide evidence for the expression of dif-ferent kinds of relationships across these applications. ++ -- --+ + +-- -- ---- triad T triad T triad T triad T Figure 1. Undirected signed triads. Based on the number of positiveedges we label triads with odd number of pluses as balanced ( T , T ),and triads with even positive edges ( T , T ) as unbalanced . We investigate social network structures from three widely-used Web sites. The first is the trust network of Epinions,where users create signed directed relations to each other in-dicating trust or distrust. The second is the social network ofthe technology blog Slashdot, where users designate othersas “friends” or “foes.” The third is the network defined byvotes for Wikipedia admin candidates. When a Wikipediauser is considered for a promotion to the status of an ad-min, the community is able to cast public votes in favor ofor against the promotion of this admin candidate. We viewa positive vote as corresponding to a positive link from thevoter to the candidate, and a negative vote as a negative link.The Epinions and Slashdot networks are explicitly presentedto users as social networking features of the sites, whereas inthe case of Wikipedia the network interpretation is implicit.The meanings of positive and negative signs are differentacross these settings, and this is precisely the point: we wishto use theories of signed edges to evaluate how the posi-tive and negative edges are being used in each setting, andto identify commonalities and differences in the underlyingnetworks in relatively different application contexts. More-over, while the current work focuses on domains in whichthe signs of edges are overtly denoted (either explicitly bydirect linking, or implicitly through actions such as votingon Wikipedia), we believe the underlying issues reach morebroadly into any application where positive and negative at-titudes between users can be conveyed, such as through sen-timent in text [20].
Theories of signed networks: Balance.
We analyze theseon-line signed networks using two different theories, and acentral issue in our study is the extent to which each of thesetheories provides a plausible explanation for the structureand dynamics of the observed networks.The first of these theories is structural balance theory , whichoriginated in social psychology in the mid-20th-century. Asformulated by Heider in the 1940s [14], and subsequentlycast in graph-theoretic language by Cartwright and Harary[4], structural balance considers the possible ways in whichtriangles on three individuals can be signed, and posits thattriangles with three positive signs (three mutual friends, Fig-ure 1 T ) and those with one positive sign (two friends with acommon enemy, Fig. 1 T ) are more plausible — and henceshould be more prevalent in real networks — than triangleswith two positive signs (two enemies with a common friend, T ) or none (three mutual enemies, T ). Balanced triangleswith three positive edges exemplify the principle that “thefriend of my friend is my friend,” whereas those with onepositive and two negative edges capture the notions that “thefriend of my enemy is my enemy,” “the enemy of my friendis my enemy,” and “the enemy of my enemy is my friend.” Structural balance theory has been developed extensively inthe time since this initial work [21], including the formula-tion of a variant — weak structural balance — proposed byDavis in the 1960s as a way of eliminating the assumptionthat “the enemy of my enemy is my friend” [7]. In partic-ular, weak structural balance posits that only triangles withexactly two positive edges are implausible in real networks,and that all other kinds of triangles should be permissible. Theories of signed networks: Status.
Balance theory canbe viewed as a model of likes and dislikes. However, asGuha et al. observe in the context of Epinions [13], a signedlink from A to B can have more than one possible inter-pretation, depending on A ’s intention in creating the link.In particular, a positive link from A may mean, “ B is myfriend,” but it also may mean, “I think B has higher statusthan I do.” Similarly, a negative link from A to B may mean“ B is my enemy” or “I think B has lower status than I do.”Here we develop this idea into a new theory of status , whichprovides a different organizing principle for directed net-works of signed links. In this theory of status, we considera positive directed link to indicate that the creator of the linkviews the recipient as having higher status; and a negativedirected link indicates that the recipient is viewed as havinglower status. These relative levels of status can then be prop-agated along multi-step paths of signed links, often leadingto different predictions than balance theory. Comparing the two theories.
To give a sense for how thedifferences between status and balance arise, consider thesituation in which a user A links positively to a user B , and B in turn links positively to a user C . If C then forms a linkto A , what sign should we expect this link to have? Balancetheory predicts that since C is a friend of A ’s friend, weshould see a positive link from C to A . Status theory, on theother hand, predicts that A regards B as having higher status,and B regards C as having higher status — so C shouldregard A as having low status and hence be inclined to linknegatively to A . In other words, the two theories suggestopposite conclusions in this case.Thus balance theory predicts that certain types of triads suchas all-positive cycles should be overrepresented compared tochance, whereas status theory makes predictions that oftendiffer. We study all the possible types of signed triads andthe predictions made by the different theories. In doing sowe consider several experimental conditions, including bothdirected and undirected networks, as well as both respectingand ignoring the order in which edges were created. Foreach such experimental condition we consider whether theobserved number of triads of each type is overrepresentedor underrepresented compared to chance, and contrast thatwith the predictions made by the balance and status theories.This analysis give us a picture of the aggregate patterns oflinks in the social networks, and the degree to which theyare explained in terms of each theory. Summary of Findings: Comparison of Balance and Sta-tus.
Both of these theories concern relationships betweenpeople; by adapting them to our on-line network datasets,hey provide potentially informative perspectives on the linkstructures we find there.Balance theory was initially intended as a model for undi-rected networks, although it has been commonly applied todirected networks by simply disregarding the directions ofthe links [21]. When we do this, we find significant align-ment between the observed network data and Davis’s notionof weak structural balance: triangles with exactly two posi-tive edges are massively underrepresented in the data relativeto chance, while triangles with three positive edges are mas-sively overrepresented. In two of the three datasets, triangleswith three negative edges are also overrepresented, which isat odds with Heider’s formulation of balance theory. Thesefindings are already intriguing, since it has traditionally beendifficult to evaluate the predictions of structural balance the-ory on large network datasets. Rather, empirical investi-gations to date have generally focused on small networkswhere social relations can be observed through direct inter-action with the individuals involved (see e.g. [8]). The trou-ble with assessing structural balance at small scales is thatone expects its predictions to be aggregate rather than abso-lute — that is, one expects to see certain kinds of trianglesas statistically more abundant or less abundant in the data,and the significance of such biases towards certain kinds oftriangles can stand out much more clearly when they are ac-cumulated over a large amount of data.Ultimately, however, we would like to understand the net-works in these on-line systems as directed structures thatevolve over time. When we view the network data in thisway, our main conclusion is that the theory of status is moreeffective at explaining local patterns of signed links, and thatit naturally extends to capture richer aspects of user behav-ior, including heterogeneity in their linking tendencies. Forexample in the case offered as an illustration above, whereuser A links positively to user B and user B links positivelyto user C , we find that negative links from C to A are mas-sively overrepresented relative to chance, with positive linkscorrespondingly underrepresented. Implications.
There are several potentially interesting im-plications of our results. First, the comparison of balanceand status provides insights into ways in which people uselinking mechanisms in social computing applications. Inparticular, there are important domains such as rating re-viewers on Epinions and voting for admins on Wikipedia inwhich such links appear, in aggregate, to be used more dom-inantly for expressions of status than for expressions of likesand dislikes.The contrast between balance and status is also related to thedistinction between undirected and directed interpretationsof links. Our findings suggest that it is important to under-stand the roles of different theories in both undirected anddirected representations of networks. Indeed, the theory ofstatus only makes sense with directed links — since it positsa status differential from the creator of a link to its recipient— while the theory of balance has been applied in both undi-rected and directed settings (e.g., [21]). The fact that (weak)balance is broadly consistent with the undirected representa- tion of our network data, while status is more consistent withthe directed representation, shows that it possible for differ-ent theories to be appropriate to different levels of resolutionin the representation of a single network.In the final part of the paper, we describe further structuralinvestigations that provide insight into ways in which signedlinks are used in these applications. First, we find that as-pects of the theory of balance hold more strongly on thesubset of links in these networks that are reciprocated —consisting of directed links in both directions between twousers. This suggests that reciprocal link formation may fol-low a different pattern of use in these systems than unrecip-rocated link formation. However, it is important to note thatsuch reciprocal relations account for only a small proportionof the links between people on these sites.Second, we find a connection between the sign of a link andthe extent to which it is embedded [12], i.e., with the twoendpoints having links to many common neighbors. A linkis significantly more likely to be positive when its two end-points have multiple neighbors (of either sign) in common.This observation is consistent with qualitative notions of so-cial capital [3, 5] — users with common neighbors have rela-tions that are “on display” in a social sense, and hence havegreater implicit pressure to remain positive. Indeed in thethree different social applications that we study, this effect isstrongest in the case of voting for Wikipedia admins, whichis the setting that makes the relations most prominently visi-ble to users. This suggests some of the ways in which thepresence of common neighbors, and more overt forms ofpublic display, can have an effect on the use of signed links.These findings about aggregate structural properties also be-gin to address a broad and largely open issue, which is tounderstand the sources of individual variation in linking be-havior. While reciprocation and embeddedness are only twodimensions along which to explore such variation, we be-lieve that the definitions and analysis pursued here can helpin framing further investigation of questions regarding indi-vidual variation.
RELATED WORK
There is by now a large and rapidly growing literature on theanalysis of social networks arising in on-line domains [18];as we noted at the outset, this line of work has almost exclu-sively treated networks as implicitly having positive signsonly. For example, portions of our analysis can be viewedas variants on the problem of link prediction [17] and tie-strength prediction [10], but in each case adapted to take thesigns of links into account.Two recent papers in the analysis of on-line social networksstand out as taking the signs of links into account. Brzo-zowski et al. study the positive and negative relationshipsthat exist on ideologically oriented sites such as Essembly[1], but with the goal of predicting outcomes of group votesrather than the broader organization of the social network.Kunegis et al. study the friend/foe relationships on Slash-dot, and compute global network properties [15], but do notevaluate theories of balance and status as we do here. pinions Slashdot WikipediaNodes 119,217 82,144 7,118Edges 841,200 549,202 103,747 + edges 85.0% 77.4% 78.7% − edges 15.0% 22.6% 21.2%Triads 13,375,407 1,508,105 790,532 Table 1. Dataset statistics.
Symbol Meaning T i Signed triad, also the number of triads of type T i ∆ Total number of triads in the network p Fraction of positive edges in the network p ( T i ) Fraction of triads T i , p ( T i ) = T i / ∆ p ( T i ) A priori prob. of T i (based on sign distribution) E [ T i ] Expected number of triads T i , E [ T i ] = p ( T i )∆ s ( T i ) Surprise, s ( T i ) = ( T i − E [ T i ]) / p ∆ p ( T i )(1 − p ( T i )) Table 2. Table of symbols.
There are also large bodies of work involving negative rela-tionships in on-line domains that pursue directions differentfrom our network focus here. One line of work focuses onnorms to control deviant behavior in on-line communities(e.g. [6] and the references therein). In a different direction,a large body of recent work in sentiment analysis [20] hasstudied on-line textual data in which individuals can expressboth positive and negative attitudes toward one another, butwithout addressing the consequences for network structure.The datasets we study here have also been investigated byresearchers for other purposes. Guha et al. study the trustnetwork of Epinions [13]. Lampe et al. study the user ratingmechanisms on Slashdot [16]. Burke and Kraut study thevoting process that produces our Wikipedia signed network[2], but with the goal of modeling election outcomes.Finally, the notion of status plays a role in many lines ofwork in the social sciences, such as the role that behavior-status theory plays in social exchange theory [9, 22]. How-ever, these notions are distinct from the ways in which weformulate definitions of status as a counterpart to balance insigned directed networks.
DATASET DESCRIPTION
As described above, we consider three large online socialnetworks where links are explicitly positive or negative: (i)the trust network of the Epinions product review Web site,where users can indicate their trust or distrust of the reviewsof others; (ii) the social network of the blog Slashdot, wherea signed link indicates that one user likes or dislikes the com-ments of another; and (iii) the voting network of Wikipedia,where a signed link indicates a positive or negative vote byone user on the promotion to admin status of another.Table 1 gives statistics for all three datasets. Our networkshave on the approximate order of tens to hundreds of thou-sand nodes, and less than a million edges. In each networkthe edges are inherently directed, since we know which usercreated the edge. In all networks the background proportionof positive edges is about the same, with roughly 80% of theedges having a positive sign.
ANALYSIS OF UNDIRECTED NETWORKS
We begin by analyzing the network data in an undirectedrepresentation, where we do not take the directions of links
Triad T i | T i | p ( T i ) p ( T i ) s ( T i ) Epinions T + + + T + − − T + + − T − − − Slashdot T + + + T + − − T + + − T − − − Wikipedia T + + + T + − − T + + − T − − − Table 3. Number of balanced and unbalanced undirected triads. into account. In this context, we can evaluate the predictionsof structural balance theory by considering the frequenciesof different types of signed triads — sets of three nodes withsigned edges among all pairs.Table 3 gives the counts of the four possible signed undi-rected triads, while Table 2 summarizes the symbols we usethroughout the paper. Let p denote the fraction of positiveedges in the network. The four possible signed undirectedtriads are denoted T , T , T , and T (Figure 1). Among alltriads in the data, the number that are of type T i is denoted | T i | and the fraction of type T i is denoted p ( T i ) . Now, wewould like to compare how this empirical frequency of triadtypes compares to the corresponding frequencies if edge signswere produced at random from the same background distri-bution of positive and negative signs. Thus, we shuffle thesigns of all edges in the graph (keeping the fraction p of pos-itive edges the same), and we let p ( T i ) denote the expectedfraction of triads that are of type T i after this shuffling.If p ( T i ) > p ( T i ) , then triads of type T i are overrepresentedin the data relative to chance; if p ( T i ) < p ( T i ) , then theyare underrepresented. We also want to measure how signif-icant this over- or underrepresentation is. Thus, we definethe surprise s ( T i ) to be the number of standard deviationsby which the actual quantity of type- T i triads differs fromthe expected number under the random-shuffling model.Due to the Central Limit Theorem the distribution of s ( T i ) is approximately a standard normal distribution and so wewould expect surprise on the order of tens to already be sig-nificant ( s ( T i ) = 6 gives a p-value of ≈ − ). However,the values of surprise we find in our data are typically muchlarger. This means that due to the scale of the data and thelarge number of triads almost all our observations are statis-tically significant with p-values practically equal to zero.We find that the all-positive triad T is heavily overrepre-sented in all three datasets, and the triad T consisting of twoenemies with a common friend is heavily underrepresented.Based on the relative magnitudes of p ( T i ) and p ( T i ) , wesee that T tends to be over represented by about 40% in allthree datasets. Similarly, the unbalanced triad T is under-represented by about 75% in Epinions and Slashdot and 50%in Wikipedia. These observations so far fit well into Heider’soriginal notion of structural balance.owever, the relative abundances of triad types T (singlepositive edge) and T (all negative edges) differ betweenthe datasets, and none of the datasets follow Heider’s theoryin both having T overrepresented and T underrepresented.Thus, the picture is more consistent with Davis’s weaker no-tion of balance, where T is viewed as implausible but thereis no a priori reason to favor one of T or T over the other. ANALYSIS OF EVOLVING DIRECTED NETWORKS
We now consider the networks in these systems as directedgraphs, incorporating the fact that the links being created gofrom one user to another, with the sign of a link from A to B being generated by A . In the introduction, we discussedhow the theories of balance and status offer competing inter-pretations for how we should expect such directed links tobe signed. For example, as noted there, positive cycles —that is, directed triads with positive links from A to B to C to A — are underrepresented in the data. This conflicts withbalance theory, but is consistent with status theory. Timing and Diversity: Generative and Receptive Base-lines.
Beyond just the directionality of links, there are ad-ditional features of the data that we take into account whenevaluating these models. First, links are created at specificpoints in time, so rather than thinking of directed triads asexisting in a static snapshot of the network, we consider theorder in which links are added to the network. Thus, westudy how directed triads form, as follows. When a user A links to a user B , suppose there is already a user X with theproperty that X has links to or from A , and also to or from B . This means there is a two-step semi-path from A to B through X (a path in which the directions of the edges donot matter), and the formation of the A - B link adds a short-cut to this path, producing a directed triad on A , B , and X .Second, different users make use of positive and negativesigns differently. At the most basic level, some users pro-duce links almost exclusively of one sign or the other, whileothers produce a relatively even mix of both positive andnegative links. We will refer to the overall fraction of posi-tive signs that a user creates, considering all her links, as her generative baseline . Similarly, some users receive links thatare almost exclusively of one sign or the other, while othersreceive a mix of signs. We will refer to the overall fractionof positive signs in the links a user receives as his receptivebaseline . Given this, we should compare the abundance ofpositive and negative links to the generative and receptivebaselines of the users producing and receiving these links.Once we incorporate these aspects of the data, we discoverfurther mysteries — beyond just the scarcity of positive cy-cles — that seem to call for alternatives to balance theory.For example, consider the case of joint positive endorsement — a situation in which a node X links positively to each oftwo nodes A and B . Suppose that in this case, A now formsa link to B (i.e., triad t of Figure 2); should we expect thereto be an elevated probability of the link being positive, or areduced probability of the link being positive?In fact, in our data, the question turns out to have a moresubtle answer than either of these alternatives. The link that is produced in this situation is more likely to be positive thanthe generative baseline of A , but at the same time less likely to be positive than the receptive baseline of B . Balance the-ory, of course, makes a much more naive prediction: since A and B are both friends of X , they should be friends of eachother. Can status theory explain this dual and opposite pairof deviations from the baselines of A and B ?We now show that in fact it can, and explaining how thisworks forms the motivation for a theory of how status effectscan influence the signs of directed links. Formulating a Theory of Status
Since the phenomenon we are trying to capture is subtle butin the end familiar from everyday life, we begin with a hy-pothetical example to motivate the subsequent definitions.
A Motivating Example.
Suppose we were to interview theplayers on a college soccer team: for certain players A , andcertain teammates B of A , we ask, “How do you think theskill of player B compares to yours?” Suppose further thatthe players roughly agree on a ranking of each other by skill,which serves as an approximate (though not perfect) rankingof the team members by status. From the results of theseinterviews, we could produce a signed directed graph whosenodes are the players, and with a directed edge from A to B if we asked A for her opinion of B . A positive link from A to B would indicate that A thinks highly of B ’s skill relative toher own, while a negative link would indicate that A thinksshe is better than B .If we were just given this signed directed graph, and knewnothing else about the soccer team, then we could still makeinferences about the signs of links that we haven’t yet ob-served, using the context provided by the rest of the network.Suppose for example that we are about to ask player A ’sopinion of another player B , but we don’t currently have A ’s answer and hence don’t yet know the sign of the linkfrom A to B . We can nonetheless make predictions about itfrom the links whose signs we do know, as follows. Supposethat we know from the data already collected that A and B have each received a positive evaluation from a third player X . Here is a pair of facts we could conjecture about the linkfrom A to B , given the positive links from X to A and B . • Since B has been positively evaluated by another teammember, B is more likely than not to have above-averageskill. Therefore, the evaluation that A gives B should bemore likely to be positive than an evaluation given by A to a random team member. • Since A has been positively evaluated by another teammember, A is also more likely than not to have above-average skill. Therefore, the evaluation that A gives B should be less likely to be positive than an evaluation re-ceived by B from a random team member.There are several subtleties here. First, we’re using the indi-rection provided by a third party X to make inferences aboutthe relation between A and B , based on assumptions aboutstatus. Second, the context provided by X causes the sign ofthe A - B link to deviate from a random baseline in different irections depending on whether we’re looking at it from A ’spoint of view or B ’s point of view. More precisely, since B has above-average skill, A will likely give B a higher evalu-ation than A would give to a random team member. On theother hand, since A has above-average skill, B is less likelyto receive a positive evaluation from A than she would re-ceive from a random team member. Despite the complexityof these conclusions, they reflect genuine and natural prop-erties of status ordering among a group of people. They alsoagree with our observations about joint positive endorsementin the data mentioned above.We turn now to the data, where we will find that the usersof these on-line networks create signed links in ways thatcorrespond closely to the behavior of the players on our hy-pothetical soccer team. But extracting this finding from thedata will require formulating a sequence of definitions thatcaptures the intuition suggested by this example. Contextualized Links.
The first portion of our definitionscapture the idea that we will evaluate the sign of a link cre-ated from A to B in the context of A and B ’s relations toadditional nodes X with whom they have links. (For exam-ple, the node X in our example who jointly endorses A and B .) Thus, we define a contextualized link (more briefly, a c-link ) to be a triple ( A, B ; X ) with the the property that alink forms from A to B after each of A and B already has alink either to or from X . Overall there are sixteen differenttypes of c-links, as the edge between X and A can go in ei-ther direction and have either sign yielding four possibilities,and similarly for the edge between X and B , for a total of · . For each of these types of c-links we are inter-ested in the frequencies of positive versus negative labels forthe edge from A to B . Figure 2 shows all the possible typesof c-links, labeled t – t .Now, for a particular type of c-link, we look at the set of allc-links ( A, B ; X ) of this type, and ask: what fraction of thelinks from A to B in this set are positive? Moreover, howdoes this fraction compare to what one would expect fromthe generative baselines of the nodes A and the receptivebaselines of the nodes B that are involved in the creationof these A - B links? If we can quantify the answer to thisquestion in our data, we can look for effects like we saw inour motivating example — there, in the case of positive linksfrom X to A and B , we believed the likelihood of a positive A - B edge should exceed the generative baseline of A butshould lie below the receptive baseline of B .Let’s consider a particular type t of c-link, and suppose that ( A , B ; X ) , ( A , B ; X ) , . . . , ( A k , B k ; X k ) is a list of allinstances of this type t of c-link in our data. We define the generative baseline for this type t to be the sum of the gen-erative baselines of all nodes A i . This quantity is simply theexpected number of positive edges we would get if we leteach A i - B i link form according to the generative baselineof A i . We then define the generative surprise s g ( t ) for thistype t to be the (signed) number of standard deviations bywhich the actual number of positive A i - B i edges in the datadiffers above or below this expectation. In other words, ifthe context provided by the node X and its links with A and t t t t A B X + +
A B X +--
A B X + +
A B X +-- t t t t A B X -- +
A B X ----
A B X --+
A B X ---- t t t t A B X + +
A B X +--
A B X ++ A B X --+ t t t t A B X --+
A B X ----
A B X +--
A B X ---- t i count P (+) s g s r B g B r S g S r t X X X X t X X X ◦ t X X ◦ X t ◦ ◦ X X t ◦ X X X t ◦ ◦ X X t X X X X t X ◦ X X t X ◦ X X t X X X X t ◦ ◦ X X t ◦ X X X t ◦ ◦ X X t ◦ ◦ X ◦ t ◦ ◦ ◦ X t X ◦ X ◦ Number of correct predictions 8 7 14 13
Figure 2. Top: All contexts ( A, B ; X ) . Red edge is the edge that closesthe triad. Bottom: Surprise values and predictions based on the com-peting theories of structural balance and status. t i refers to triad con-texts above; Count : number of contexts t i ; P (+) : prob. that closingred edge is positive; s g : surprise of edge initiator giving a positive edge; s r : surprise of edge destination receiving a positive edge; B g : consis-tency of balance with generative surprise; B r : consistency of balancewith receptive surprise; S g : consistency of status with generative sur-prise; S r : consistency of status with receptive surprise. B had no effect on the sign of the A - B link being formed,so that each node A i simply drew the sign of her link to B i according to her generative baseline, then we should expectto see a generative surprise of for this type t .We set up the corresponding definitions for the nodes B i asthe recipients of the links. We define the receptive baseline for this type t of c-link to be the sum of the receptive base-lines of all nodes B i , and we define the receptive surprise s r ( t ) to be the (signed) number of standard deviations bywhich the actual number of positive A i - B i edges in the datadiffers above or below this expectation. Incorporating the Role of Status.
Finally, we bring the roleof status into this theory. For this, it is useful to return oncemore to our motivating example. When a player X on ourhypothetical soccer team gave positive evaluations to both A nd B , we concluded — in the absence of any further infor-mation — that A and B were likely to have above-averagestatus. We would have concluded the same thing had A and B given negative evaluations to X . On the other hand, if X had evaluated A and B negatively, or had they evaluated X positively, then we should have concluded that A and B were more likely than not to have below-average status.This reasoning provides a way to assign status values to A and B in any type of c-link, as follows. We first assign thenode X a status of . Then, if X links positively to A , or A links negatively to X , we assign A a status of ; other-wise, we assign A a status of − . We use the same rule forassigning a status of or − to B . Thus we say that thegenerative surprise for type t is consistent with status if B ’sstatus has the same sign as the generative surprise: in thiscase, high-status recipients B receive more positive evalua-tions than would be expected from the generative baseline ofthe node A producing the link. We say that the receptive sur-prise for type t is consistent with status if A ’s status has theopposite sign from the receptive surprise: high-status gen-erators of links A produce fewer positive evaluations thanwould be expected from the receptive baseline of the node B receiving the link. Results
We now evaluate the predictions of these theories on the twonetworks, Epinions and Wikipedia, for which we have dataon the exact order in which the links were created. We focusour discussion on Epinions, for which the data is an order ofmagnitude larger; the results are quite similar on the smallerWikipedia dataset, with differences that we note below.We consider four theories to explain the signs of the linksthat are produced. The first two are the consistency of sta-tus with generative and receptive surprise, as just defined.The other two theories are the analogous forms of consis-tency with Heider’s original notion of balance. Specifically,we say that Heider balance is consistent with generative sur-prise for a particular c-link type if the sign of the generativesurprise is equal to the sign of the edge as predicted by bal-ance. Analogously, we say that Heider balance is consistentwith receptive surprise for a particular c-link type if the signof the receptive surprise is equal to the sign of the edge aspredicted by balance.We find that the predictions of status with respect to bothgenerative and receptive surprise perform much better againstthe data that the predictions of structural balance. Indeed,status is consistent with generative and receptive surprise onthe vast majority of c-link types; as shown in Figure 2, itis consistent on 14 and 13 types respectively. This includesthe case of joint endorsement (type t in Figure 2) — whichis in fact the most abundant type of c-link in the data — andalso includes the natural counterpart of joint endorsement, inwhich A and B each link negatively to X (type t ). It alsoincludes the case of a positive cycle (type t ), discussedearlier as well. On the Wikipedia dataset, the results for receptive surprise arealmost identical; status is consistent with receptive surprise on all c-link types except for the same three exceptional cases as Epinions,
Structural balance is a much weaker fit to the data: balance isconsistent with generative surprise for only 8 of the 16 typesof c-links, and consistent with receptive surprise for only 7of the 16. We also evaluated consistency of generative andreceptive surprise with respect to Davis’s weaker notion ofbalance, with similar results. The one subtlety in evaluat-ing the data with respect to Davis balance is that Davis’stheory does not predict the sign of the A - B edge in c-linktypes where the two existing edges with X are both negative( t , t , t , and t ): for these triads, either a positive or anegative A - B link would be consistent with Davis’s theory,and so no prediction can be made. Thus, we evaluate consis-tency of Davis balance with respect to generative and recep-tive surprise only on the remaining 12 c-link types; here, wefind consistency in 6 and 7 of the 12 cases respectively. Thistoo is much weaker than the predictions of status.We also consider the structure of the cases in which statustheory fails to make a correct prediction, analyzing the possi-ble strengthenings of the theory that this might hint at. First,we observe that one of the two c-link types where status isinconsistent with generative surprise is the configuration inwhich A and B each link positively to X (type t ). Thisis one of the most basic settings for structural balance inHeider’s work: if two people each like a third party, thenone should expect them to have positive relations. It thussuggests where users of these systems may be relying onbalance-based reasoning more than status-based reasoning.We can get further insights from the cases where status the-ory is inconsistent with the data. In particular, the 16 c-linktypes can be divided into four groups of four each, based onwhether A has high or low status relative to X , and whether B has high or low status relative to X . In looking at wherestatus theory makes mistakes, it is almost exclusively on thec-link types where A and B are both posited to have low sta-tus relative to X . This corresponds to the types t , t , t ,and t ; we observe that with respect to generative surprise,both of status theory’s mistakes occur on types of this form,and with respect to receptive surprise, two of status theory’sthree mistakes occur on types of this form.Even further, the mistakes of status with respect to genera-tive and receptive surprise on these types constitute natural“duals” to each other. Note first that if we reverse both thedirection and the sign of an edge, we preserve the status re-lation of the two endpoints (e.g. a positive link from A to X or a negative link from X to A both suggest that A haslower status than X ). With this in mind, we observe that ifwe take the types t and t on which status theory makesits two mistakes with respect to generative surprise, and wereverse the directions and signs of both edges involving X ,we get the c-link types t and t — these are the other twoc-link types where A and B have low status relative to X ,and they are two of the three types on which status theorymakes mistakes with respect to receptive surprise. t , t , and t , and one more: t . We find this close alignmentquite surprising given the very different kinds of activities that theEpinions and Wikipedia links represent. On Wikipedia, status isalso consistent with generative surprise on 12 of the 16 triad types,though here the types where there is inconsistency differ more fromEpinions: t (as in Epinions), t , t , and t . pinions Count Probability P (+ | +) P ( −| +) P (+ |− ) P ( −|− )
560 0.308
Wikipedia
Count Fraction P (+ | +) P ( −| +)
145 0.055 P (+ |− )
193 0.706 P ( −|− )
80 0.294
Table 4. Edge reciprocation. Given that the first edge was of sign XP ( Y | X ) give the probability that reciprocated edge is Y . It is thus natural to conjecture that the use of signed links de-viates most strongly from status theory when A is predictedto impute low status to both herself and B . Now that this be-havioral asymmetry has been identified in the data, via ourformulation of this theory, developing a more refined theoryof status that takes this asymmetry into account is an inter-esting direction for further work. RECIPROCATION OF DIRECTED EDGES
Thus far we have found that balance theory is a reasonableapproximation to the structure of signed networks when theyare viewed as undirected graphs, while status theory bet-ter captures many of the properties when the networks areviewed in more detail as directed graphs that grow over time.To understand the boundary between these two theories andwhere they apply, it is interesting to consider a particularsubset of these networks where the directed edges are usedto create symmetric relationships. This subset is the collec-tion of edges that are reciprocal : cases in which there aretwo nodes A and B such that A links to B and B also linksto A . (If the B - A link forms after the A - B link, we say that B reciprocates the link to A .) In our data, only about 3-5%of the edges represent the reciprocation of an existing link,so this is far from being a dominant mode of link creation onthese systems. But it is an interesting mode of link creation,in that it represents a directly mutual relationship betweentwo individuals A and B , which is the setting in which bal-ance theory has been more relevant to our earlier analyses.Our findings for this type of linking suggest the followingintuitively natural picture: in the relatively small portionof these networks where mutual back-and-forth interactiontakes place, the principles of balance are more pronouncedthan they are in the larger portions of the networks wheresigned linking (and hence evaluation of others) takes placeasymmetrically. In other words, users treat each other differ-ently in the context of back-and-forth interaction than whenthey are using links to refer to others who do not link back.We summarize the results in Table 4. First, we find thatthe reciprocation of positive A - B edges is closely consis-tent with balance rather than status, while the reciprocationof negative edges seems to follow a hybrid of the two prin-ciples. Specifically, if A links positively to B , then balancepredicts that B should link positively to A , while status pre-dicts that B has the higher status and should therefore linknegatively to A . For the two systems in which we have dataon the order of edge creation — Epinions and Wikipedia —we find that the data clearly supports the balance interpreta- Epinions
Triads P ( RSS ) P (+ | +) P ( −|− ) Balanced 348,538 0.929 0.941 0.688Unbalanced 74,860 0.788 0.834 0.676
Wikipedia
Triads P ( RSS ) P (+ | +) P ( −|− ) Balanced 53,973 0.912 0.934 0.336Unbalanced 13,542 0.661 0.878 0.195
Table 5. Edge reciprocation in balanced and unbalanced triads.
Tri-ads: number of balanced/unbalanced triads in the network where oneof the edges was reciprocated. P ( RSS ) : probability that the recipro-cated edge is of the same sign. P (+ | +) : probability that the + edge islater reciprocated with a plus. P ( −|− ) : probability that the − edge isreciprocated with a minus. tion, as shown in Table 4. When a B - A link reciprocates apositive A - B link, this B - A link is positive well over 90%of the time — much higher than the roughly 80% fraction ofpositive links in the system as a whole.Reciprocation of a negative A - B link, on the other hand, dis-plays ingredients of both theories. When A links negativelyto B and B subsequently links to A , balance theory predictsa negative link while status theory predicts a positive one(since A should have higher status). In the data, such B - A links are positive roughly 70% of the time. This shows thatusers respond to a negative link with a positive link a major-ity of the time, but still at a rate below the 80% fraction ofpositive links in the system as a whole, suggesting a devia-tion in the direction of the balanced-based interpretation.From Table 4, it is also interesting to observe how similar theprobabilities for all kinds of reciprocation are between thetwo systems Epinions and Wikipedia. This is particularlystriking given how different the level of public display oflink signs is on these systems; it suggests that these rates ofalignment in the signs are being driven by forces that may berelatively robust to the way in which link signs are presented. The Role of Triadic Structure in Reciprocation
We now consider how reciprocation between A and B isaffected by the context of A and B ’s relationships to thirdnodes X . Specifically, suppose that an A - B link is part ofa directed triad in which each of A and B has a link to orfrom a node X . Now, B reciprocates the link to A . As in-dicated in Table 5, we find that the B - A link is significantlymore likely to have the same sign as the A - B link when theoriginal triad on A - B - X (viewed as an undirected triad) isstructurally balanced. In other words, when the initial A - B - X triad is unbalanced, there is more of a latent tendency for B to “reverse the sign” when she links back to A . The effectholds in all cases; it is more pronounced in Wikipedia thanin Epinions, which is interesting given the difference in howpublic the edge signs are.This result further indicates how balance-based effects seemto be at work in the portions of the networks where directededges point in both directions, reinforcing mutual relation-ships. We conjecture that this tension between mutuality andasymmetry in different parts of the network will be relevantin understanding more deeply the interplay between statusand balance effects in shaping the formation of links. FURTHER STRUCTURAL ANALYSIS OF SIGNED LINKS
Finally, we explore some additional connections betweennetwork structure and the signs of links, focusing on the em-eddedness of edges and on the subgraphs consisting only ofpositive links and only of negative links. For these structuralresults, we analyze the networks as undirected graphs.
Embeddedness of positive and negative ties
We begin by trying to characterize the parts of the networkin which positive ties are more likely to occur. Roughly, wefind that positive ties are more likely to be clumped together,while negative ties tend to act more like bridges betweenislands of positive ties.We explore this issue in Figure 3 by plotting the probabil-ity that an edge is positive as a function of its embeddedness , i.e., the number of common neighbors that its endpointshave [12], or equivalently, the number of distinct triads theedge participates in. For each dataset we plot two curves.In green, we show the results of a random-shuffling base-line — the sign probability we would get as a function ofembeddedness if edge signs were determined randomly andindependently with probability p for each edge. As is clear,there is no dependence here between an edge’s sign and itsembeddedness, so the green curve is approximately flat.However, in the real data (red) we see a completely differentpicture. Edges that are not well embedded (with endpointshaving fewer than around 10 shared neighbors) tend to bemore negative than expected based on the background prob-ability p of positive ties. However, as an edge is more em-bedded (participating in more triads) it tends to be increas-ingly positive. That is, a link is significantly more likely tobe positive when its two endpoints have multiple neighbors(of either sign) in common. These findings are consistentacross all three datasets. This suggests that positive edgestend to occur in better embedded (densely linked) groups ofnodes, while negative edges tend to participate in fewer tri-angles, which indicates that they act as connections betweenthe well-embedded sets of positive ties.As mentioned in the Introduction, this observation is not partof the formulation of balance theory (and does not followfrom it), but it is consistent with the notion from social-capital theory of embedded edges being more “on display”[3, 5]. Moreover, among our three datasets, this phenomenonis most pronounced for the Wikipedia voting data. This isalso the only one of the three sites where the social relationsare explicitly displayed to a broad set of users — thus puttingthe relations even more highly on display. Thus these resultsare particularly well explained in terms of implicit pressureto remain positive. All-Positive and All-Negative Networks
To explore further the different roles played by positive andnegative links in these networks, we study the sub-networkscomposed exclusively of the positive links and exclusively ofthe negative links. That is, we define the all-positive networkto be the subgraph consisting only of the positive links, andthe all-negative network to be the subgraph consisting onlyof the negative links. We also compare these to randomizedbaselines, in which we first randomly shuffle the edge signsin the full network, and then extract the all-positive and all-negative networks from these shuffled versions.
Size Clustering ComponentNodes Edges Real Rnd Real RndEpinions: − + − + − + Table 6. Networks composed of only positive (negative) edges.
Real: network induced on the positive (negative) edges.
Rnd: network whereedge signs are randomly permuted.
Clustering: fraction of closed tri-ads (closed triads divided by number of length 2 paths)
Component: fraction of nodes in the largest connected component.
Table 6 summarizes several structural properties of thesenetworks and their randomized variants. First, we considerthe amount of clustering , defined as the fraction of A - B - C paths in which the A - C edge is also present (thus forminga “closed”triad A - B - C ). In all three datasets, we find thatthe all-positive networks have significantly higher cluster-ing than their randomized counterparts, and the all-negativenetworks have significantly lower clustering. This furtherreinforces the observation that positive edges tend to occurin clumps, while negative edges tend to span clusters.Interestingly, both the all-positive and all-negative networksare less well-connected than expected, in the sense that theirlargest connected components are smaller than those of theirrandomized counterparts. While this may seem initially counter-intuitive, one possible interpretation is as follows. The giantcomponents of real social networks are believed to consistof densely connected clusters linked by less embedded ties[11, 19]. The all-positive and all-negative networks in thereal (rather than randomized) datasets are each biased to-ward one side of this balance: the all-positive networks havedense clusters without the bridging provided by less embed-ded ties, while the all-negative networks lack a sufficientabundance of dense clusters to sustain a large component.We also consider the fraction of nodes that are outliers withrespect to in- and out-degree in the all-positive and all-negativenetworks — with degrees exceeding twice the mean for thenetwork. (For reasons of space, these numerical results arenot shown in the table.) These outlier fractions remain largelyunchanged when the edge signs are randomized, with twoexceptions that each hint at interesting conclusions for theeffects of displaying signed edges to users. First, the frac-tion of outliers for positive in-degree is higher than expectedon Wikipedia, where edge signs are more public. This sug-gests a possible tendency for an excess of users to conformto already positive voting outcomes. Second, the fractionof outliers for negative out-degree is lower than expectedon Epinions and Slashdot, where edge signs are less pub-lic. This is a bit more surprising; it suggests that despite theless public nature of the signs, there are fewer people whoare prolific in their negative evaluations — either becausethe dynamics of these sites suppresses this type of people, orbecause they are not attracting people who engage in it. CONCLUSION
Social networks underlying current social media sites oftenreflect a mixture of positive and negative links. Here wehave investigated two theories of signed social networks — balance and status . Balance is a classical theory from so- F r ac ti on o f p l u s e dg e s Number of common neighborsNetworkRandomized 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 10 20 30 40 50 F r ac ti on o f p l u s e dg e s Number of common neighborsNetworkRandomized 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 10 20 30 40 50 F r ac ti on o f p l u s e dg e s Number of common neighborsNetworkRandomized (a) Epinions (b) Slashdot (c) Wikipedia
Figure 3. Embeddedness of positive ties in the network. More embedded edges tend to be more positive. cial psychology, which in its strongest form postulates thatwhen considering the relationships between three people, ei-ther only one or all three of the relations should be positive.Status is a theory of directed signed networks which postu-lates that when person A makes a positive link to person B ,then A is asserting that B has higher status — with a neg-ative link from A analogously implying that A believes B has lower status. These two theories make different predic-tions for the frequency of different patterns of signed linksin a social network. On networks derived from Epinions,Slashdot, and Wikipedia, we find that each model predictscertain kinds of social relationships, and that there is strongconsistency in how the models fit the data across these threerelatively different settings. Moreover, differences in resultsbetween the datasets highlight some interesting aspects ofhow the sites present information.We have discussed the central interpretations of our findings,and here we briefly review some of the most salient. Whenthe networks are viewed as undirected graphs, we find strongevidence for a weak form of structural balance, observingthat in all three datasets triangles with exactly two positivesigns are massively underrepresented in the data relative tochance, while triangles with three positive edges are over-represented. We further find that a link is significantly morelikely to be positive when its two endpoints have multipleneighbors (of either sign) in common — a finding that con-nects balance with notions from the theory of social capital.This is particular pronounced for Wikipedia, where the signsof edges are also the most publicly prominent.When the networks are viewed as directed graphs, on theother hand, incorporating the fact that each link is created byone individual to point to another, we find that many of thebasic predictions of balance theory no longer apply. Instead,the signs of directed links closely follow the predictions ofthe theory of status we develop, in which inferences aboutthe sign of a link from A to B can be drawn from the mutualrelationships that A and B have to third parties X . The signsand directions of these relationships to X provide informa-tion about the status levels of A and B , which in turn accu-rately predict the deviations in the sign of their interactionfrom broader background distributions. Investigating differ-ent contexts for links, and the differences between one-wayand reciprocated links, sheds further light on the subtle waysin which users of these systems draw on behaviors rooted inboth balance and status when they link to one another. REFERENCES
1. M. J. Brzozowski, T. Hogg, G. Szab´o. Friends and foes:ideological social networking.
Proc. ACM CHI , 2008. 2. M. Burke and R. Kraut. Mopping up: Modelingwikipedia promotion decisions.
Proc. CSCW , 2008.3. R. S. Burt. The network structure of social capital.
Research in Organizational Studies , 22:345–423, 2000.4. D. Cartwright, F. Harary. Structure balance: Ageneralization of Heider’s theory.
Psych. Rev.
American Journal of Sociology , 94(1988).6. D. Cosley, D. Frankowski, S. Kiesler, L. Terveen, J.Riedl. How oversight improves member-maintainedcommunities.
Proc. CHI , 2005.7. J. A. Davis. Clustering and structural balance in graphs.
Human Relations , 20(2):181–187, 1967.8. P. Doreian and A. Mrvar. A partitioning approach tostructural balance.
Social Networks , 18:149–168, 1996.9. M. H. Fisek, J. Berger, R. Norman. Participation inheterogeneous and homogeneous groups: A theoreticalintergration.
American Journal of Sociology , 97(1991).10. E. Gilbert, K. Karahalios. Predicting tie strength withsocial media.
Proc. ACM CHI , 2009.11. M. Granovetter. The strength of weak ties.
AmericanJournal of Sociology , 78:1360–1380, 1973.12. M. Granovetter. Economic action and social structure:The problem of embeddedness.
American Journal ofSociology , 91(3):481–510, Nov. 1985.13. R. V. Guha, R. Kumar, P. Raghavan, A. Tomkins.Propagation of trust and distrust.
Proc. WWW , 2004.14. F. Heider. Attitudes and cognitive organization.
Journalof Psychology , 21:107–112, 1946.15. J. Kunegis, A. Lommatzsch, C. Bauckhage. TheSlashdot Zoo: Mining a social network with negativeedges.
Proc. WWW , 2009.16. C. Lampe, E. Johnston, P. Resnick. Follow the reader:Filtering comments on Slashdot.
Proc. CHI , 2007.17. D. Liben-Nowell, J. Kleinberg. The link-predictionproblem for social networks.
J. American Society forInformation Science and Technology , 58(2007).18. M. E. J. Newman. The structure and function ofcomplex networks.
SIAM Review , 45:167–256, 2003.19. J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo,D. Lazer, K. Kaski, J. Kertesz, and A.-L. Barabasi.Structure and tie strengths in mobile communicationnetworks.
Proc. Natl. Acad. Sci. USA , 104(2007).20. B. Pang and L. Lee.
Opinion Mining and SentinmentAnalysis . Now Publishers, 2008.21. S. Wasserman, K. Faust.
Social Network Analysis:Methods and Applications . Camb. U. Press, 1994.22. D. Willer.