[PDF] Chasm in Hegemony: Explaining and Reproducing Disparities in Homophilous Networks

Abstract

In networks with a minority and a majority community, it is well-studied that minorities are under-represented at the top of the social hierarchy. However, researchers are less clear about the representation of minorities from the lower levels of the hierarchy, where other disadvantages or vulnerabilities may exist. We offer a more complete picture of social disparities at each social level with empirical evidence that the minority representation exhibits two opposite phases: at the higher rungs of the social ladder, the representation of the minority community decreases; but, lower in the ladder, which is more populous, as you ascend, the representation of the minority community improves. We refer to this opposing phenomenon between the upper-level and lower-level as the \emph{chasm effect}. Previous models of network growth with homophily fail to detect and explain the presence of this chasm effect. We analyze the interactions among a few well-observed network-growing mechanisms with a simple model to reveal the sufficient and necessary conditions for both phases in the chasm effect to occur. By generalizing the simple model naturally, we present a complete bi-affiliation bipartite network-growth model that could successfully capture disparities at all social levels and reproduce real social networks. Finally, we illustrate that addressing the chasm effect can create fairer systems with two applications in advertisement and fact-checks, thereby demonstrating the potential impact of the chasm effect on the future research of minority-majority disparities and fair algorithms.

Full PDF

CChasm in Hegemony: Explaining and Reproducing Disparities inHomophilous Networks

Yiguang Zhang , Jessy Xinyi Han , Ilica Mahajan , Priyanjana Bengani , and AugustinChaintreau Columbia University Massachusetts Institute of Technology

Abstract

In networks with a minority and a majority community, it is well-studied that minorities areunder-represented at the top of the social hierarchy. However, researchers are less clear aboutthe representation of minorities from the lower levels of the hierarchy, where other disadvantagesor vulnerabilities may exist. We oﬀer a more complete picture of social disparities at each sociallevel with empirical evidence that the minority representation exhibits two opposite phases: at thehigher rungs of the social ladder, the representation of the minority community decreases; but,lower in the ladder, which is more populous, as you ascend, the representation of the minoritycommunity improves. We refer to this opposing phenomenon between the upper-level and lower-level as the chasm eﬀect . Previous models of network growth with homophily fail to detect andexplain the presence of this chasm eﬀect. We analyze the interactions among a few well-observednetwork-growing mechanisms with a simple model to reveal the suﬃcient and necessary conditionsfor both phases in the chasm eﬀect to occur. By generalizing the simple model naturally, we presenta complete bi-aﬃliation bipartite network-growth model that could successfully capture disparitiesat all social levels and reproduce real social networks. Finally, we illustrate that addressing thechasm eﬀect can create fairer systems with two applications in advertisement and fact-checks,thereby demonstrating the potential impact of the chasm eﬀect on the future research of minority-majority disparities and fair algorithms.

The "glass-ceiling" eﬀect has multiple real-world applications; it is invoked when describing the invis-ible barrier that women — or any minority group — hit in their career as they approach the upperechelons of management[1][2]. The top of the hierarchy has been well studied, whereas research on mi-nority representation in the rest of the social hierarchy has received less attention. Having a completecharacterization of social disparities at all levels of the hierarchy helps tackle questions including thepoint at which a minority group starts experiencing a systemic disadvantage, and at what rung of theladder — if any — are minorities fairly represented.We tackle these questions leveraging real-world datasets (QQ, WhatsApp, and Instagram) in anattempt to understand the distribution of minority representation across the entire hierarchy. Ourmain ﬁnding is the surprising but repeated evidence that the ratio of people belonging to a minoritygroup initially increases as one moves up in the lower layers of the hierarchy, before it reaches aplateau and drops. We refer to this eﬀect as a “chasm” because people who observe the lower orupper layer of a hierarchy might agree that a systemic bias is present but would hastily claim it isin opposite directions. This is in striking contrast to the monotonic behavior one would expect in allprevious systemic models of hegemonic biases. As we prove that previous models cannot explain ourobservation, we also provide the ﬁrst generative model that oﬀers a simple explanation and is generalenough to apply broadly. a r X i v : . [ c s . S I] F e b he question we ask in this paper addresses the causes of this chasm eﬀect. What are the mech-anisms that interact with each other to create both the glass-ceiling eﬀect and the chasm eﬀect, andin particular, how do social networks play a role in creating these two eﬀects?Previous studies on the glass-ceiling eﬀect have provided mechanisms that capture the glass-ceilingeﬀect in uni-partite networks like Facebook[3]. However, the same mechanisms do not capture thechasm eﬀect. In this paper, we primarily focus on bipartite networks which have two types of entitieswith diﬀerent natures, while we also show the present of the chasm eﬀect in unipartite networks.We are interested in bipartite networks like WhatsApp for two reasons: (1) the nature of bipartitenetworks is less understood but more intriguing due to their complexity; (2) many social platformsare now group-based where members ﬁnd communities of their interests within the larger network.We analyze the interactions among a few well-observed network-growing mechanisms with a simplemodel to reveal the suﬃcient and necessary conditions for both the glass-ceiling eﬀect and the chasmeﬀect to both be present. We further generalize the simple model naturally and present a completebi-aﬃliation bipartite network-growth model. We demonstrate our proposed model’s eﬀectivenessthrough both mathematical proofs and data synthesis. Our generative model is the ﬁrst to capturethe chasm eﬀect in social disparities.This study has important practical applications, especially as it puts a spotlight on structural biasesin bipartite networks and hints at ways to address them. More speciﬁcally, the new idea of chasm eﬀectwe put forward provides a foundation for allocating resources diﬀerently in diverse settings to minimizebias among those people who constitute a large portion of the population and more disadvantagedand vulnerable. We present two examples taken from diﬀerent contexts: (1) (gender fairness) wehope to provide recruiters with a better job placement strategy if they want to diversify their pool ofcandidates; (2) (political fairness) in politics-related group chats where conversations are not accessibleoutside the immediate community, we aim to show how fake-news can have more of an adverse impacton the minority population in a constrained environment.As a summary, our main contributions are:• We prove the existence of the chasm eﬀect with empirical evidence from real-world datasets,and characterize the phenomenon in-depth to provide a more complete picture of social parities.That is, we show that the ratio of the minority community does not decrease monotonically aswe move up the hierarchy. (Section 3)• We analyze the interactions among network-growth mechanisms and derive the necessary mech-anisms for both the chasm eﬀect and the glass-ceiling eﬀect to be present in bipartite networks.(Section 4)• We propose a complete bipartite-bi-aﬃliation network-growth model that generalizes the nec-essary mechanisms discussed in Section 4. The generalized model is capable of reproducingreal-world social networks. Under the generalized model, we provide proofs to show that bothtypes of entities in the generated networks have power-law degree distributions, and specify thesuﬃcient and necessary conditions mathematically for both the glass-ceiling eﬀect and the chasmeﬀect to present. (Section 5)• Finally, we provide two real-world applications of our ﬁndings, job advertisement and fact-checking, where the chasm eﬀect could impact the direction of bias, thereby motivating theimportance of considering the chasm eﬀect. (Section 6)Those results together suggest that the chasm eﬀect can be observed, at least frequently in onlinenetworks which may exhibit simple selective homophily dynamics, and has consequences. We urge somecaution as our results do not, however, prove that the chasm is unavoidable: Some social networks (and,under some conditions, our general model) can exhibit a systemic monotonic bias against minoritygroups at all level of the hierarchy. 2 Related Work

Social disparities and the hegemony of the majority community have been widely studied in uni-partite social networks, and it has been well-observed that disadvantages are exerted on the minoritycommunity, for example, in the case of the gender gap[1][2] or rural-urban inequality[4]. It has alsobeen shown, through homophilous preferential attachments, that structural bias in uni-partite socialnetworks can create such disadvantages[3] at the top of the hierarchy and the eﬀects can be reinforcedwhen recommendation algorithms are applied[5]. However, no existing model analyzes the structuralbiases that may exist beyond the top of the hierarchy.Further, studying hegemony is no longer straightforward in bipartite networks. Often, the bipartitenetorks are comprised of diﬀerent types of entities and it is only meaningful to study homophilywithin a single entity.

Projection can convert bipartite networks back to uni-partite networks, butthis loses important network information[6]. Therefore, a model that studies hegemony directly onbipartite networks is imperative. Unfortunately, there are not many bipartite network models andeven fewer studies on social disparities. Previous analytical literature[7] and[6] provide notationsstudying bipartite networks and extend several common notations in uni-partite network to bipartitenetworks, but those do not consider hegemony. Random graphs models like Stochastic Block Modelcan be used to model homophily, but do not reproduce the large range of degrees[8] well-observed insocial networks. Conﬁguration models like exponential random graph models[9] can be modiﬁed tostudy homophily but are restricted by nature to static graphs with no internal reinforcing dynamics.We hereby introduce the ﬁrst generative model that can be used to analyze hegemony in bipartitenetworks.One important application of bipartite networks is fairness in fake news detection in group-membernetworks . In the last decade, researchers have expended tremendous eﬀorts attempting to automati-cally detect fake news by analyzing texts[10][11], images[12], propagation models[13], and more[14][15].Most auto-detection methods apply only to public social media where platforms have access to all con-tent. However, on private platforms (such as WhatsApp, which is end-to-end encrypted), platformsare unable to proactively auto-detect misinformation due to the lack of visibility into the content.Instead, one of the ways in which they detect potentially inaccurate political news is through user re-ports. Due to the diversity of information and the massive volume of queries received, stories reportedas fake by a large number of users are often prioritized by fact checkers[16]. When there is more thanone political party in the network, such detection methodologies may create unfairness as the partywith more members could be subject to more misinformation.To the best of our knowledge, our results are the ﬁrst to tackle factors aﬀecting fairness of fakenews detection in encrypted social media. Here we deﬁne a hegemonic subset as one that is systematically over-represented among the tail ofmost popular nodes. It was shown that a majority aﬃliation among the nodes can become hegemonicunder simple rich-get-richer and homophily dynamics[3]. We ﬁnd, among three large networks withaﬃliations, including for a bipartite graph, that such hegemonic subsets always exhibit a remarkableparadox: It appears, starting at small degrees, that members of the hegemonic subset are becomingscarcer as degree increases, while the fraction of members from other subset initially increases. Thiscreates a chasm since, assuming one concentrates on a partial local observation of the degree distribu-tion, one may hastily conclude that network growth either disproportionally favors or disfavors thosein the hegemonic subset.

QQ dataset[17]

One of the most popular instant messengers for group chats in China is Tencent QQ,which has over 700 million active users. Users can create new groups or join existing ones. Depending

3n the account level of the group creator, QQ group sizes are capped at 100, 200, 500, or 1000 members.This dataset contains 274,335,183 users and 58,523,079 groups, of which 273,204,518 users have genderinformation and 48,676,355 groups have the complete information about the group identiﬁer, memberlist, and group creation date. Females take up 42.5% of the users in this dataset; hence, we labelgroups whose members are less than 42.5% female as male-dominated groups and those with withmore than 42.5% females are classiﬁed as female-dominated.We observe that the group size distribution of this dataset becomes discontinuous at 100, 200, and500 members due to the imposed group-size caps. To avoid the impacts of the discontinuity, we focusour analysis on groups of sizes no larger than 100, which account for 99.2% of all groups in the datasetand have an the average ratio of female membership in a group of 40.9%.

WhatsApp dataset[18][19]

WhatsApp is one of the most widely-used messaging apps aroundthe world. The WhatsApp data we use was collected over a period of 9 months from October 2018to June 2019. It includes 2,092 groups around political conversations and 205,880 unique users. Theparty aﬃliation of each group is labeled according to the group title and some of its content by authorsin[19]. Based on the ideology and relevant reports of the group’s party aﬃliation, we characterize eachgroup’s political leaning as pro-BJP or anti-BJP where BJP stands for Bharatiya Janata Party, thecurrent ruling party in India. To obtain sound and rigorous results, we only consider groups wherethe political leanings are evident. Once we identify the political leanings of groups, we label eachuser as pro-BJP if the ratio of pro-BJP groups the user joined exceeds the overall pro-BJP groupratio in the dataset and vice versa. Overall, we get 1,198 pro-BJP groups and 465 anti-BJP groups,with 62,920 pro-BJP members and 21,625 anti-BJP members sharing 897 images manually labelled asmisinformation.The data are very sparse for groups with more than 165 members in this dataset, so we restrictour analysis to groups of size less than 165. Furthermore, since WhatsApp is an end-to-end encryptedapplication where members have a reasonable expectation for privacy, we drop all groups with lessthan 52 members ( % of the maximum group size). Instagram dataset[5]

Instagram is a photo- and video-sharing platform where people like andcomment on content. The Instagram dataset was collected between 2014 and 2015 and has a totalof 553,628 diﬀerent users whose genders were inferred from their names. Females make up 54.4%of all users in this dataset. Even though females makeup more than half of the data, they are stillconsidered the disadvantaged for two reasons: (1) other features of this network, like the degreedistribution, suggest a bias against the female users; (2) keep it consist to prior works.

An unequal proportion of two aﬃliations arise in diﬀerent identity contexts like gender (social identity)and political leaning (political identity). In the QQ dataset which illustrates the social identity aspect,female members make up 42.5% of the population and 41.0% of the groups are female-dominated, thusfemales are considered as the minority and male the majority; in our WhatsApp dataset which exhibitsthe political identity aspect, 25.6% of all members and 28.0% of all groups are anti-BJP, thus anti-BJP is denoted as the minority and pro-BJP the majority. Despite the completely diﬀerent nature ofthe majority-minority groups in these two datasets, our later ﬁndings will show that they share somesimilar properties, which is worth further study.

The degree distribution of a network reﬂects how the resources and power are distributed in soci-ety. Previous studies on one-mode social networks demonstrate a “rich-get-richer” mechanism[20][8],suggesting that those with more connections have an advantage in building even more connections.In bi-aﬃliation bipartite networks, we study each aﬃliation and each type of entities separately. Wetake the number of members within a group as degree of the group entities and take the number ofgroups a member joins as degree of member entities. We ﬁnd a smooth slow decay in small degrees4igure 1:

Homophily mechanism : we observe in both the QQ dataset, where the minority-majorityimbalance often arises in the context of gender disparity, and in the WhatsApp dataset, where theimbalance often arises in the context of political parties, the network exhibits homophily. That is,people have the tendency to connect with the ones of their own aﬃliation.and a fast decay in large degrees for both the group size distributions and member degree distribu-tions, exhibiting a similar “rich-get-richer” result as in one-mode networks. Speciﬁcally, in the QQdataset, female-dominated groups follow a power law with power -4.00 and their male counterparts,-3.51; QQ female members follow a power law with power -3.82 and their male counterparts the same;in the WhatsApp dataset, anti-BJP groups follow a power law with power -2.67 and their pro-BJPcounterparts -2.48; WhatsApp anti-BJP members and their pro-BJP counterparts follow a power lawwith almost identical power, -2.29 and -2.23 respectively. This “rich-get-richer” result on bi-aﬃliationbipartite networks illustrates a few basic ideas on member-group interactions: (1) members are morelikely to join large groups, likely due to large groups’ popularity or their potential to oﬀer more re-sources; (2) this higher tendency of members to join large groups is more pronounced when joiningmajority groups; (3) members who are active in joining groups are more likely to join new groups thanthose who are less active.

Homophily is a well-observed phenomenon that says that people tend to connect with those who aresimilar to them[21]. To test for homophily, we count the number of minority-majority member pairs.Speciﬁcally, two members form a member pair if they are both in the same groups, and they havemultiple pairs if they share multiple common groups. We count the number of member pairs in thenetwork such that one end of the pair is a member from the minority aﬃliation and the other endis a member from the majority. Note that when there is no homophily in the network, the ratio ofminority-majority member pairs over all member pairs is r (1 − r ) , where r is the percentage of redmembers in the network. Having the actual minority-majority member pairs is less than the expectednumber of minority-majority member pairs is therefore an indication for homophily.Both the QQ dataset and the WhatsApp dataset show a strong indication for homophily, asthe actual number of minority-majority member pairs (orange line) is signiﬁcant smaller than theexpected value when assume no homophily (yellow line). Therefore, we conclude that homophilyexists in bipartite networks.As a conclusion, our analysis on the real-world data illustrates the following three mechanisms inbi-aﬃliated group-member networks:1. minority-majority aﬃliation: the two aﬃliations have non-negligible size diﬀerences.2. rich-get-richer: new members are more likely to join large groups; members who are active injoining groups are more likely to join new groups than those who are less active.3. homophily: members are more likely to join groups of their own aﬃliation.5igure 2: Chasm eﬀect on group ratio: we observe that in both datasets, the ratio of minority groups(vs majority groups) is not monotone. As expected from the glass-ceiling eﬀect, the ratio decreases forlarge group sizes; however, it increases for small groups, which constitutes a larger parts of all groups.In this plot, the radius of light-red circles are in proportional to groups counts.Figure 3: Chasm eﬀect on average member ratio: we again observe that the average ratio of minoritieswithin groups of ﬁxed sizes ﬁrst increases, and then decreases. The radius of light-red circles areproportional to the sum of group sizes. While the glass-ceiling eﬀect depicts the under-representation of minorities at the higher rungs, wezoom out to study the minority representation across the social hierarchy. We ﬁnd that at the lowestlevel, minorities are also under-represented and this under-representation eases as they move up thesocial ladder but deteriorates closer to the top. This matches the glass-ceiling eﬀect at the higherlevels. As the minority representation exhibits opposite trends when we move up in the lower rungsand in the upper rungs, we refer to this phenomenon as the “chasm eﬀect” between the lower-level andupper-level.In our bi-aﬃliated bipartite networks, we observe this chasm eﬀect for both the group mode andmember mode. As shown in Fig. 2, we calculate the ratio of minority-dominated groups each groupsize bucket and ﬁnd that the minority group ratio does not monotonically decrease. More speciﬁcally,in the QQ dataset, we observe that the ratio of female-dominated groups increases among groups ofsize 1-55 and decreases afterwards. In the WhatsApp dataset, the ratio of the anti-BJP group increasesfor groups of size 52-85 and decreases thereafter. In both plots, we see that the very small and verylarge minority groups are under-represented and the representation improves in medium-sized minoritygroups. Similarly, in Fig. 3, we calculate the average ratio of minority members at each level of groupsizes and ﬁnd a similar non-monotonic trend. In the QQ dataset, the average ratio of female membersin a group ﬁrst increases among groups of size smaller than 55 and decreases afterward; similarly, theaverage proportion of anti-BJP members in the WhatsApp dataset increases among groups of size lessthan 82, and decreases thereafter. In both plots, we see that minority members are under-representedin the very small, and the representation gets improved in middle-sized minority groups.The above observations have not been studied in the existing literature of social networks but they6igure 4: Chasm eﬀect on unipartite networks: the chasm eﬀect is not unique to bipartite networks.We observe in the projected QQ membership networks, as well as in the Instagram network, that theratio of female connections a member has ﬁrst increase, then decreases. This common pattern sharedby networks of diﬀerent type, as well as networks focusing on diﬀerent context, indicates that theremay be simple structural patterns that are not explained in the existing literature.are non-negligible. First, smaller groups constitute a signiﬁcant portion of all groups in the networks:40.9% groups have sizes smaller than 55 in the QQ dataset, and 41.7% groups have sizes smallerthan 82 in the WhatsApp dataset. Furthermore, this observation is not unique to bipartite networksas we ﬁnd a similar non-monotonic result in both projected networks and one-mode networks. Weﬁrst project the original QQ group-member network to construct a QQ membership network, wheremembers consist of the node-set and two nodes are joined by an undirected edge if and only if they arein the same group. If the two members share multiple groups, they are connected by multiple edges.We deﬁne “member degree” as the number of connections a member has and calculate the proportionof minority connections at each level of member connections. In Fig. 4, we ﬁnd the average ratio offemale connections rises among members with fewer than 95 connections ( 55.3% of the population)and decreases thereafter. A similar trend can be found in the one-mode Instagram dataset: we seethat the average ratio of female connections increases among users with fewer than 75 connections( 99.7% of the population) and decreases afterward.This more complete picture of minority representation in every level of a social hierarchy is es-pecially signiﬁcant as it can provide insights into minorities at the lower rungs who are far moredisadvantaged and vulnerable than those at the higher level. Previous models of network growth withonly the three mechanisms discussed in section 3.2 are unable to capture or explain this chasm eﬀect(proved in Section 4). This motivates us to propose a new bi-aﬃliation bipartite network model inthe next sections that could reveal the complex interaction among several driving mechanisms of thesocial disparities.

We now examine the roles played by the observed mechanisms, and the way they interact with eachother, as well as another well-observed social network mechanism, to create the glass-ceiling and thechasm eﬀect. To better characterize the interactions, we use a simple model to show that the twoeﬀects can naturally arise under a speciﬁc combination of the network mechanisms. What’s more, themechanisms that constitute this combination are necessary conditions for the two eﬀects to occur atthe same time.

Formally, we consider a bi-aﬃliated bipartite network, with one subset of nodes representing members, M ,and the other groups, G . We assume two aﬃliations in the network and we denote them as red and7igure 5: SHM and GSHM: in SHM, at each time t , exactly one connection is built between the set ofmembers and the set of groups. A chosen member can either create a group or join an existing group.If joining an existing group, the member selects a group based on either (1) rich-get-richer mechanism,or (2) equal-chance mechanism. If homophily is applied to the chosen mechanism, then the membermay reject the connection and choose a new group until successfully joining a group. The GSHMfollows the same step, except that in SHM, the homophily level is the same for diﬀerent mechanismsand members from diﬀerent aﬃliations, while GSHM has diﬀerentiated homophily levels.blue, where the red aﬃliation represents the minority, and the blue aﬃliation represents the majority.Every member m ∈ M belongs to exactly one of the two aﬃliations. Similarly, every group g ∈ G belongs to one aﬃliation. We use N ( M ∪ G, t, Θ) to denote a network generated with a model by Θ at time step t where Θ is the set of parameters that is used to generate networks.We assume the following well-observed mechanisms:1. rich-get-richer: current active members are likely to join more groups than current inactivemembers; large groups are likely to have a higher growth rate than small groups.2. homophily: members tend to join groups of their own aﬃliation.3. equal-chance: members may join groups uniformly at random.Applying the homophily mechanism to the other two give rise to three possible homophilous mecha-nisms. Namely, selective homophily on rich-get-richer , selective homophily on equal-chance , and generalhomophily . We now test each of them in a simple network growth model.Formally, we have Θ = ( α, η, r, ξ, ρ ) , where α and η captures the arrival rate of members andgroups, respectively, r ≤ / represents the likelihood of a new arrival member being red, ≤ ξ ≤ captures the level of the rich-get-richer mechanism for groups, and ρ represents the level of homophilyin the network.We now describe this selective homophily model (SHM) in more details, and demonstrate it inFigure 5 and Figure 6.At time t = 2 , we initialize the bipartite network with one red member connecting to a red group,and one blue member connecting to a blue group. At time t , the network grows as follows:• Member Growth : – ( minority - majority ) with probability α ( < α < ), a new member m ∗ joins the network,and it is colored red with probability r ( < r ≤ / ); – ( rich-get-richer ) otherwise, with probability − α , we randomly pick an existing member m ∗ with a probability proportional to deg ( m ∗ ) .• Group Growth : with probability η ( < η < ), the member creates a group of color c ( m ∗ ) .8 Connection Growth : with probability − η , the member m ∗ joins an existing group, accordingto the following two steps: – ( rich-get-richer ) with probability ξ , m ∗ picks a group g ∗ with probability proportional todeg ( g ∗ ) .∗ under selective homophily on rich-get-rich mechanism or general mechanism : if c ( m ∗ ) = c ( g ∗ ) , m ∗ joins g ∗ directly; otherwise, m ∗ accepts the connection with probability ρ . If m ∗ does not accept the connection, m ∗ restarts from the beginning of the ConnectionGrowth until a new connection is built.∗ under selective homophily on equal-chance : m ∗ joins g ∗ directly. – ( equal-chance ) with probability − ξ , m ∗ uniformly picks a group g ∗ at random.∗ under selective homophily on rich-get-rich mechanism : m ∗ joins g ∗ directly∗ under selective homophily on equal-chance mechanism or general mechanism : if c ( m ∗ ) = c ( g ∗ ) , m ∗ joins g ∗ directly; otherwise, m ∗ accepts the connection with probability ρ . If m ∗ does not accept the connection, m ∗ restarts from the beginning of the ConnectionGrowth until a new connection is built.

We now provide the formal deﬁnition of the glass-ceiling eﬀect and the chasm eﬀect. First note thatthe two subsets of nodes in bipartite networks often represent diﬀerent entities, and therefore shall beanalyzed separately. For the purpose of this paper, we focus our analysis on the group set, and referto both the tail glass-ceiling eﬀect and the chasm eﬀect as the eﬀects on groups.The tail glass-ceiling eﬀect in bipartite networks captures the phenomenon that groups of oneaﬃliation are under-represented among the largest groups. Let top ( G ) k ( R ) (top ( G ) k ( B ) ) be the numberof red (blue) groups that have a size at least k , as t goes to inﬁnity. Deﬁnition 4.1. (tail glass-ceiling) A network sequence {N ( M ∪ G, t, Θ) } exhibits a tail glass-ceilingeﬀect (or glass-ceiling eﬀect for short) against red if there exists an increasing function k ( t ) such that lim t →∞ top ( G ) k ( t ) ( B ) = ∞ , and lim t →∞ top ( G ) k ( t ) ( R ) top ( G ) k ( t ) ( B ) = 0 . (1)The chasm eﬀect captures the phenomenon that the representation of groups of the minorityaﬃliation ﬁrst increases and then decreases, as the group size goes up. Deﬁnition 4.2. (chasm) A network sequence {N ( M ∪ G, t, Θ) } exhibits a chasm eﬀect against red ifthere exists K < ∞ such that the ratio of red groups r ( G ) k as a function of k increases for k < K anddecreases for k > K .We ﬁrst note that the selective homophily on rich-get-richer mechanism can lead to both the tailglass-ceiling eﬀect and the chasm eﬀect. As we will establish all of the following Lemma results laterin a generalized model, we defer our proofs to corollaries found in Section 5. Lemma 4.1.

Under some conditions of Θ , a network sequence {N ( M ∪ G, t, Θ) } generated by SHMwith the selective homophily on rich-get-richer mechanism exhibits both the tail glass-ceiling eﬀect andthe chasm eﬀect as t goes to inﬁnity. Previous works on uni-partite networks imply that the selective homophily on equal-chance mech-anism cannot lead to the glass-ceiling eﬀects[3]. Indeed, this is also true for bipartite networks.

Lemma 4.2.

A network sequence {N ( M ∪ G, t, Θ) } generated by SHM with selective homophily onthe equal-chance mechanism do not exhibit tail glass-ceiling eﬀect. Lemma 4.3.

A network sequence {N ( M ∪ G, t, Θ) } generated by SHM with the general mechanismdo not exhibit chasm eﬀect. So far, we have shown that having the selective homophily model is necessary for both the tailglass-ceiling eﬀect and the chasm eﬀect. We have also seen that selective homophily on the equal-chance mechanism does not help create the glass-ceiling eﬀect either. Moreover, having the samelevel of homophily on the rich-get-richer mechanism and the equal-chance mechanism would eliminatethe chasm eﬀect. It seems like the equal-chance mechanism is not useful in creating the glass-ceilingand the chasm eﬀects (Figure 6-(d)). However, this is not true. The following corollary shows thatalthough the homophily on equal-chance mechanism is not necessary for either eﬀect to emerge, theequal-chance mechanism itself is needed to have the chasm eﬀect.

Lemma 4.4.

A sequence of networks N ( M ∪ G, t, Θ) generated by SHM without the equal-chancemechanism do not exhibit chasm eﬀect. Therefore, the equal-chance mechanism is also a necessary mechanism in creating both eﬀects. Weconclude the above ﬁndings in Table 1.

Theorem 4.1.

The selective homophily on rich-get-richer mechanism and the equal-chance mechanismare both necessary mechanisms for networks generated through the SHM to exhibit both the tail glass-ceiling eﬀect and the chasm eﬀect.

Intuitively, the equal-chance mechanism gives small groups chances to grow, and having homophilyon rich-get-richer mechanism allows majorities to grow large groups. Under the selective homophilyon rich-get-richer mechanism, because there are more majority groups, minorities are less likely to joingroups through the rich-get-richer mechanism. Instead, they grow smaller groups. In a long run, thereare more minority groups with middle sizes; when there is no equal-chance mechanism, small groups10o not have the chance to grow, and therefore the network does not have the chasm eﬀect; under theselective homophily on the equal-chance mechanism, because there is no homophily on rich-get-richermechanism, majorities do not have the chance to grow large groups, and therefore, there is no glass-ceiling eﬀect; under the general homophily mechanism, small blue groups grow no less than small redgroups, and thus do not exhibit the chasm eﬀect. If we allow diﬀerent homophily levels for the majorityand the minority, it is possible for small red groups to grow faster than blue groups. We will see morein the next section. The interaction among the mechanisms in the real world is undoubtedly morecomplex, but we hope the above intuition could oﬀer a more profound understanding of the drivingmechanisms of social disparities.

We now extend the SHM with selective homophily on rich-get-richer mechanism to a new model thatserves two purposes: ﬁrst, it can still capture both the glass-ceiling eﬀect and the chasm eﬀect; second,it allows more degrees of freedom, and therefore can reproduce real social networks. In this section,we introduce a generalized model, prove the suﬃcient and necessary conditions for the two eﬀects tohappen, and reproduce real datasets with the generalized model.For clarity, we list all notations that are used in our theory presentation in Table 2.

The previous analysis on SHM implies that the level of homophily plays an important role in largeblue groups and small red groups’ faster growth rate than the other aﬃliation. We therefore introducea new generalized selective homophily model (GSHM) with two sets of new parameters: ρ ( u ) r ( ρ ( u ) b )captures the level of red (blue) selective homophily on equal-chance mechanism ; ρ ( p ) r ( ρ ( p ) b ) capturesthe level of red (blue) selective homophily on rich-get-richer mechanism .We now present the generalized model in details. At time t = 2 , we initialize the bipartite networkwith one red member connecting to a red group, and one blue member connecting to a blue group.At time t , the network grows as the following:• Member Growth : – ( minority-majority ) with probability α ( < α < ), a new member m ∗ joins the network,and it is colored red with probability r ( < r ≤ / ); – ( rich-get-richer ) otherwise, with probability − α , we randomly pick an existing member m ∗ with probability proportional to deg ( m ∗ ) .• Group Growth : with probability η ( < η < ), the member creates a group of color c ( m ∗ ) .• Connection Growth : with probability − η , the member m ∗ joins an existing group, accordingto the following two steps: – ( rich-get-richer ) with probability ξ , m ∗ picks a group g ∗ with probability proportional todeg ( g ∗ ) . If c ( m ∗ ) = c ( g ∗ ) , m ∗ joins g ∗ directly; otherwise, m ∗ accepts the connection withprobability ρ ( p ) c ( m ∗ ) . If m ∗ does not accept the connection, m ∗ restarts from the beginningof the Connection Growth until a new connection is built. – ( equal-chance ) with probability − ξ , m ∗ uniformly picks a group g ∗ at random. If c ( m ∗ ) = c ( g ∗ ) , m ∗ joins g ∗ directly; otherwise, m ∗ accepts the connection with probability ρ ( u ) c ( m ∗ ) .If m ∗ does not accept the connection, m ∗ restarts from the beginning of the ConnectionGrowth until a new connection is built.Under GSHM, when a user decides on whether to join a selected group, the probability of acceptingdepends on both the user’s aﬃliation and the mechanism that the user uses to pick the group. We11eneral notations: c ( x ) color of node x ∈ M ∪ G . deg ( x ) degree of node x ∈ M ∪ G .Group notations: G t ( C ) number of groups in color C at time t . G k,t ( C ) number of groups in color C with size k at time t ; G k ( C ) :=lim t →∞ E ( G k,t ( C ) ) t . r ( G ) t ( C ) group growth rate of color C at time t ; that is, r ( G ) t ( C ) := G t ( C ) t .Member notations: M t ( C ) number of C members at time t . M k,t ( C ) number of members in color C with degree k at time t ; M k ( C ) :=lim t →∞ E ( M k,t ( C ) ) t . M ( k ) t ( C ) number of members in color C that are contained in groups of size k . r ( M,G ) k,t ( C ) ratio of expected number of members in color C that are contained ingroups of size k at time t . r ( M,G ) k,t ( C , C ) ratio of expected members of color C in groups of size k with color C ; r ( M,G ) k ( C , C ) = lim t →∞ r ( M,G ) k,t ( C , C ) .Edges notations: E ( G ) t ( C ) sum of group sizes in color C at time t ; r ( E,G ) t ( R ) := E ( G ) ( R ) t . E ( M ) t ( C ) sum of member degrees in color C at time t ; r ( E,M ) t ( R ) := E ( M ) ( R ) t .Table 2: Notationillustrate this probability speciﬁcation in Figure 5 - (b). Note that all of the three homophilousmechanisms are special cases of the GSHM.We now mathematically characterize the degree distributions of the two types of nodes in GSHMand provide the suﬃcient and necessary conditions for the glass-ceiling eﬀect and the chasm eﬀect tohappen. We ﬁrst investigate the size distributions of the red and blue groups in a bipartite network generatedby the GSHM, and show that the number of red groups of size k , G k ( R ) and the number of bluegroups of size k , G k ( B ) follow power laws under the GSHM model. Theorem 5.1.

Let { N ( M ∪ G, t, Θ } be a sequence of networks produced by the GSHM model. Assumethat ρ ( p ) R , ρ ( p ) B > . The red group-size distribution G k ( R ) and the blue group-size distribution G k ( B ) asymptotically follow the power law distributions; speciﬁcally, as t goes to inﬁnity, G k ( R ) ∝ k − β ( R ) , G k ( B ) ∝ k − β ( B ) , (2) with β ( R ) = 1 + C R, and β ( B ) = 1 + C B, , where C R, := r (1 − η ) ξ − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η ) ρ ( p ) B ξ − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r , (3) C B, := (1 − r )(1 − η ) ξ − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r + r (1 − η ) ρ ( p ) R ξ − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) , (4)12 nd α ∗ is the unique number ∈ (0 , satisfying α ∗ = rη + r (1 − η )( ξα ∗ + (1 − ξ ) r )1 − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η )( ρ ( p ) B ξα ∗ + ρ ( u ) B (1 − ξ ) r )1 − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r . (5) Proof.

The proof sketch employs similar recursive techniques to Theorem 4.12 in[3]. One challengingstep in our setting, is to show that r ( E,G ) t ( R ) → α ∗ almost surely as t → ∞ . In[3] this is provedby constructing a Doob Martingale Φ t := E [ T r ( E,G ) T ( R ) | F t ] for ≤ t ≤ T , and showing that | Φ t − Φ t − | could have a nice bound, and hence by Azuma’s inequality of martingale one can derivea concentration inequality for α t . This method does not work in our case, since our model is morecomplicated and we do not have a direct bound of | Φ t − Φ t − | . Instead, to overcome this, we prove that Z t := ( r ( E,G ) t ( R ) − α ∗ ) is an almost supermartingle (introduced in[22]), which converges to a limitrandom variable Z ∞ almost surely. We then prove that lim t →∞ E [ Z t ] = 0 , which implies that Z ∞ mustequal almost surely, and gives our desired result. We delay the detailed proof to the appendix. We can use similar strategies to show that the member degrees also follow power-laws with the samepower.

Theorem 5.2. (proof in appendix) Let { N ( M ∪ G, t, Θ } be a sequence of networks produced by GSHM.The red member-degree distribution M k ( R ) and the blue member-degree distribution M k ( B ) asymptot-ically follow the power law distributions with the same power; speciﬁcally, as t goes to inﬁnity, M k ( R ) ∝ k − ( − α ) , M k ( B ) ∝ k − ( − α ) . (6) The existence of tail glass-ceiling follows directly from Theorem 5.1.

Corollary 5.1.

Let { N ( M ∪ G, t, Θ } be a sequence of networks produced by GSHM. Let β ( R ) , β ( B ) be as deﬁned in Theorem 5.1. Then • when β ( R ) < β ( B ) , { N ( M ∪ G, t, Θ } exhibits tail glass-ceiling eﬀect against the blue groups. • when β ( R ) > β ( B ) , { N ( M ∪ G, t, Θ } exhibits tail glass-ceiling eﬀect against the red groups. • when β ( R ) = β ( B ) or ξ = 0 , { N ( M ∪ G, t, Θ } the network does not exhibit tail glass-ceilingeﬀect.Proof. Assume β ( R ) < β ( B ) . Let k ( n ) := n β ( B ) . Then E [ top Gk ( B )] = n (cid:88) k (cid:48) ≥ k G k (cid:48) ( B ) = O (cid:18) n · n − β ( B ) β ( B ) (cid:19) = O (1); (7)and, we have for an (cid:15) > , E [ top Gk ( R )] = n (cid:88) k (cid:48) ≥ k G k (cid:48) ( R ) = Ω (cid:18) n · n − β ( R ) β ( B ) (cid:19) = Ω (cid:18) n − β ( R ) β ( B ) (cid:19) = Ω( n (cid:15) ) . (8) Corollary 5.2.

A network sequence {N ( M ∪ G, t, Θ) } generated by SHM with selective homophily onequal-chance leads no tail glass-ceiling eﬀect for groups.Proof. SHM with selective homophily on equal-chance implies ρ ( u ) R = ρ ( u ) B and ρ ( p ) R = ρ ( p ) B = 1 ,which yields C R, = C B, = r (1 − η ) ξ − (1 − ρ )(1 − ξ )(1 − r ) + (1 − r )(1 − η ) ξ − (1 − ρ )(1 − ξ ) r . (9)13 .3.2 Chasm We are now ready to prove the ﬁrst result on monotonicity of minority ratio change in homophilousnetworks, from a novel analysis of the distribution. Suppose a network produced by the GSHM hastail glass-ceiling eﬀect against red groups, the following theorem provides the necessary and suﬃcientcondition for the chasm eﬀect to happen.

Theorem 5.3.

Following the same notations as in Theorem 5.1. Assume C R, < C B, . then the groupratio sequence { G k ( R ) /G k ( B ) , k ≥ } has the chasm eﬀect against red, if and only if k ∗ > , where k ∗ := (1 + C R, )(1 + C B, ) − (1 + C R, )(1 + C B, ) C R, − C B, , (10) Where C R, := r (1 − η )(1 − ξ ) η − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η ) ρ ( u ) B (1 − ξ ) η − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r ; (11) C B, := (1 − r )(1 − η )(1 − ξ ) η − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r + r (1 − η ) ρ ( u ) R (1 − ξ ) η − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) . (12) Moreover, when k ∗ > , the monotonicity of { G k ( R ) /G k ( B ) , k ≥ } changes at [ k ∗ ] , which is thelargest integer smaller than k ∗ .Proof. We ﬁrst deﬁne g ratio ( k ) := G k ( R ) /G k ( B ) G k − ( R ) /G k − ( B ) = 1 − C R, kC R, + C R, − C B, kC B, + C B, . (13)To see the monotonicity of { G k ( R ) /G k ( B ) } , it is suﬃcient to compare g ratio ( k ) with 1. Note that, g ratio ( k ) > ⇔ C R, kC R, + C R, < C B, kC B, + C B, . (14)With some algebra, we have that C R, kC R, + C R, − C B, kC B, + C B, = ( k − k ∗ )( C B, − C R, )(1 + kC R, + C R, )(1 + kC B, + C B, ) . (15)Since the denominator is positive and C B, − C R, > , we therefore have that g ratio ( k ) > for k < k ∗ ,and g ratio ( k ) < for k > k ∗ . When k ∗ < , g ratio ( k ) > for all k > , and therefore is monotonicallyincreasing. Corollary 5.3.

Following the notation deﬁned in Theorem 5.1 and Theorem 5.3. A network sequence {N ( M ∪ G, t, Θ) } produced by GSHM has a group chasm eﬀect against the red groups if and only if C R, < C B, and k ∗ > . Corollary 5.4.

A network sequence {N ( M ∪ G, t, Θ) } generated by SHM with the general homophilymechanism leads to no chasm eﬀect.Proof. The general selective homophily is equivalent to setting ρ ( u ) r = ρ ( p ) r = ρ ( u ) b = ρ ( p ) b in the GSHM.It is easy to see that, for some positive constant γ > , we have that C R, = γC R, , C B, = γC B, .Substituting this relation into the expression for k ∗ , we have that k ∗ = (1 + C R, )(1 + γC B, ) − (1 + C B, )(1 + γC R, ) C R, − C B, = 1 − γ < , (16)14 orollary 5.5. A network sequence {N ( M ∪ G, t, Θ) } generated by SHM with no equal-chance mech-anism in the model leads to no chasm eﬀect.Proof. Removing the oppotunity mechanism from SHM is equivalent to setting ξ = 1 in the GSHM.It is easy to check that C R, = C B, = 0 , and thus k ∗ = 1 . So far, our analysis on bipartite networks focuses mainly on groups. We have observed in Section 3.3that the average member ratio in groups with a ﬁxed size is also non-monotone. The following lemmacalculates the average red member ratio among groups of size 1, and that among groups of size goingto inﬁnity. When both values are below r , we can say that the member ratio is non-monotone. Lemma 5.1.

For the red member ratios within groups with size 1, and within groups with size goes toinﬁnity, we have: • For groups with size 1, lim t →∞ r ( M,G )1 ,t ( R ) = G ( R ) G ( R ) + G ( B ) = 1 + C B, + C B, C R, + C R, + C B, + C B, . (17)(18)• For groups with size goes to inﬁnity, assume C R, < C B, , lim k →∞ lim t →∞ r ( M,G ) k,t ( R ) = r ( M,G ) ( R ) , (19) where r ( M,G ) is deﬁned as r ( M,G ) ( R ) = q RB q RB + q BB , (20) with q RB = rρ ( p ) R (cid:16) − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r (cid:17) , (21) q BB = (1 − r ) (cid:16) − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) (cid:17) . (22) Proof.

Due to the space limit, we provide a sketch proof here, and the detailed proof is in the AppendixF. For groups with size 1, since there is exactly 1 red (blue) member in every red (blue) groups withsize 1, the ratio r ( M,G )1 ,t ( R ) is the same as the ratio of red groups among all groups with size 1, whichgoes to G ( R ) / ( G ( R ) + G ( B )) . For groups with size goes to inﬁnity, we ﬁrst prove (in Lemma F.1)that, the portion of red members in red (blue) groups with size k , goes to p RR,k (and p RB,k ), whoseexpressions are provided in Lemma F.1. Under the condition C R, < C B, , blue groups dominate thelarge size groups, and thus r ( M,G ) ( R ) should be the limit of p RB,k as k goes to inﬁnity, which weanalyze in Appendix F. In the previous sections, we have noticed that all the real-data observations we present in Section3.2 and 3.3 may be present in networks generated by GSHM. Then a natural question to ask is: canGSHM simulate the chasm eﬀect and the glass-ceiling eﬀect of real social networks? We illustrate itsperformance in terms of its capability of reproducing the chasm eﬀect and the glass-ceiling eﬀects fromreal social networks.To do so, we ﬁrst need to infer parameters from the real dataset. The minority-majority ratio r ,the member growth rate α , and the group growth rate γ can be directly calculated from the dataset.Note that α ∗ , deﬁned in (62) and r ( M,G ) , deﬁned in (116) can be also inferred from the dataset. We15igure 7: Model ﬁts: our simulated data captures both the glass-ceiling eﬀect and the chasm. It alsocaptures the distribution of group/member counts for diﬀerent group sizes (radius of yellow circles).then can solve a system of equations for ξ, ρ ( p ) R , ρ ( p ) B , ρ ( u ) R , ρ ( u ) B from the ﬁve equations in (45),(46), and(119).We test this inference assumption with the QQ dataset, by simulating 10 networks of size 50,000using the inferred parameters, and take the mean of group ratios and member ratios, as well as theaverage counts for groups of a certain size. We present our result in Figure 7. For the female-dominatedgroup ratio, we see that it demonstrates both the glass-ceiling eﬀect and the chasm eﬀect. Moreover,our simulation locates the group size where the monotonicity of the ratio changes. This re-conﬁrmsour calculation in Theorem 5.3. For the average female member ratio, we see that it again exhibitsboth the glass-ceiling eﬀect and the chasm eﬀect. However, it does not locate wehre the monotonicitychanges. We are not surprised by this inaccuracy, as our generalized model only extend SHM byallowing diﬀerent homophily levels, and we expect real social networks to be more complicated. For abetter performance, a more complex model may be needed. The presence of hegemony in networks have already been linked to important consequences on thefairness of many graph algorithms[23]. We now present two examples where our identiﬁed chasm eﬀect,which contrasts with the tail eﬀect, invites us to shed light on the fairness of targeted advertisementand content moderation.

The nature of classiﬁed ads went through a seismic shift with the advent of Craigslist. Employersposted job opportunities online, providing an additional advantage to people with access to com-puters and good internet connections. In the last two decades, recruitment strategies evolved further;prospective employees are targeted on LinkedIn or Facebook based on self-uploaded proﬁles. But, theseapproaches can be exclusionary or discriminatory - perhaps inadvertently - and expensive. Nowadays,recruiters often post job openings in large interest groups on social networks as a way to organicallyreach a larger, more diverse audience without paying a premium for targeted advertising or job boards.However, since the make-up of groups is not uniform, the strategy adopted when can impact the diver-sity of candidates applying to open positions. To cast a wide net, the focus may be on larger groups,which may increase the gender imbalance. However, the chasm eﬀect shows that there is a group-sizethreshold that, if adopted, can help ensure a more diverse net is cast with the job posting reachingmore women. Acknowledging the existence of this threshold and attempting to determine the optimalthreshold could go a long way in reducing the implicit biases in the hiring process.In detail, consider the advertising strategy that places ads for groups with size greater than orequal to k A . Let r ( A ) ( k A ) be the ratio of red members among all the members seeing the ads, in thelimit t → ∞ . We have the following theorem, whose proof is delayed to Appendix C.16 heorem 6.1. Assume the red member ratios for very small and large groups are smaller than theaverage red member ratio r in the network. There exist < k lowerA ≤ k upperA , such that • For k A > k upperA , r ( A ) ( k A ) < r ; • For k A < k lowerA , r ( A ) ( k A ) > r . We examine this result empirically on the QQ dataset, and we see (in Figure 8) that we can choose k lowerA = k upperA = 63 . That is, if the group-size threshold is larger than 63, the advertising strategyfavors males; on the other hand, if it is less than or equal to 63, the advertising strategy favors females.Figure 8: Advertisement simulations on QQ: consider the advertising strategy that places ads to groupslarger than a threshold. We see as the threshold gets lower, the advertising strategy changes fromfavoring males to females in terms of member exposures. The conditions of the information landscape have deteriorated signiﬁcantly over the course of the lastdecade. Conspiracy theories, false rumors about people and events, and hateful content have all beenampliﬁed. Group chats and end-to-end encrypted chats have not escaped this fate. They are rampantwith the same malcontent and have the added problem where a lot of conversations are not subjectto scrutiny. One of the many approaches to address this ecosystem is to rely on fact-checkers whoidentify pieces of information to verify and provide in-depth analysis into their veracity. Fact-checkingorganizations scour diﬀerent parts of the open web and social platforms to build up their database, andmany have also set up additional tip-lines as one force to counteract the widespread misinformation.Diﬀerent fact-checking organizations have diﬀerent strategies in terms of prioritizing what to fact-check. Typically, it is based on a combination of importance (e.g. elections), relevance (e.g. breakingnews events), the number of types an individual piece of content has been ﬂagged, and the number ofplatforms on which it has been ﬂagged.In a highly simpliﬁed scenario where people have an equal tendency to report fake news whenthey see it, and the fact-checkers always prioritize to check news with more reports, one could askthe question whether prioritizing based on the number of reports is fair. As news from larger groupsis more likely to be checked, the glass-ceiling eﬀect implies that relative to the majority, fake newsthat originates or spread among minority members might be less likely to be detected and removed;however, the chasm eﬀect shows that this is not necessarily true.Assume that the probability of fake news being detected in a group depends on the group size andthe likelihood of all pieces of malcontent being detected. For simplicity, let θ ∈ [0 , be the strengthof the detector, with θ = 1 indicating all fake news will be detected, and θ = 0 indicating nothingwill be ﬂagged for a fact-check. For each group with size k , denote h ( k, θ ) as the probability that fakenews in the group is detected. Equivalently, h ( k, θ ) is the expected ratio of detected fake news overall fake news in the group. We assume the function h ( · , · ) satisﬁes:17. h ( · , · ) is monotone increasing in group size: h ( k, θ ) < h ( k + 1 , θ ) .2. h ( · , · ) is monotone increasing in detecting strength: h ( k, θ ) < h ( k, θ ) for ≤ θ < θ ≤ ;3. h ( k,

0) = 0 , h ( k,

1) = 1 , and lim θ → h ( k, θ ) h ( k + 1 , θ ) = 0 , lim θ → − h ( k, θ )1 − h ( k + 1 , θ ) = ∞ . (23)We make the assumption (1) since fake news in larger groups is likely to be reported more times,and therefore has a higher probability to be detected. Assumption (2) makes sense since θ measuresthe detector’s strength. The last assumption (3) is a technical assumption, which means that groupswith larger sizes dominate groups with smaller sizes, in the sense: 1. as the strength of the detectorgoes to 0, h ( k, θ ) goes to 0 faster than h ( k + 1 , θ ) ; 2. as the strength of the detector goes to 1, h ( k, θ ) goes to 1 slower than h ( k + 1 , θ ) .Regard h ( k, θ ) as the protection score of a group with size k , and let r ( D ) ( θ ) be the ratio of redgroups’ scores over total scores, that is, r ( D ) ( θ ) = (cid:80) k ≥ G k ( R ) h ( k, θ ) (cid:80) k ≥ ( G k ( R ) + G k ( B )) h ( k, θ ) . (24)We then have the following theorem, whose proof is deferred to XXX. Theorem 6.2.

Assume the red group ratio G k ( R ) / ( G k ( R ) + G k ( B )) is less than the overall red groupratio r for very small and large groups in the network as t → ∞ . Then there exist < θ lower ≤ θ upper < , such that • For θ > θ upper , r ( D ) ( θ ) > r ; • For θ < θ lower , r ( D ) ( θ ) < r .Proof. We just present a sketch of proof here, and the detailed proof is delayed to Appendix D. As thedetection strength goes to 0, by our assumption (3) on h ( k, θ ) , only large groups matter for r ( D ) ( θ ) ,where the ratio of red groups is less than r . Hence lim θ → r ( D ) ( θ ) < r . Similarly we can show that lim θ → r ( D ) ( θ ) > r .We examine this result empirically in WhatsApp, an end-to-end encrypted chat application, with asimulated fact-checking system. Assume that for fake news to be detected, it ﬁrst needs to be reportedto a fact-checking organization who will prioritize the fact-check. We assume that the number of reportsreceived in a group of size k follows the Poisson distribution with the parameter being p · k , where p captures the tendency of reporting fake news in the network. Without further assumptions, we set p = 0 . . The fact-checker ranks all the reported pieces of content by volume; if two items have thesame number of reports, the fact-checker ranks the one from the larger group higher. Finally, thefact-checker sets a percentage threshold P to check items ranked within the top P % ranked items. Werepeat this simulation 100 times, and report our ﬁndings in Figure 9.Note that the percentage threshold corresponds to the likelihood of all pieces of malcontent beingdetected. We see, in Figure 9 (a) as more fake news is detected, the protection ratio crosses the averageanti-BJP ratio. That is, if the fact-checking organizations focus purely on the volume of reports, itfavors the majority. If there is an opportunity, however, to apply more resources to the fact-checkinginitiatives, the majority is no longer favored. Similar trends are found also for the ratio of numberof times red groups (vs blue groups) are checked, the ratio of the total number of people protectedin red (vs blue groups), and the ratio of the total number of red members (vs blue members) gettingprotected. 18igure 9: Fact-checking simulations on WhatsApp: the glass-ceiling eﬀect indicates that fact-checkersalways protect more majority; however, we see that, as the detection strength gets larger, fact-checkersstart protecting more minority than majority. The graphs formed among us, and the structures of groups and communities connecting every individ-ual, govern how today’s information propagates and gets selectively curated. Bias is quick to emergeand interact with the simplest network primitives as well as more complex algorithmic rules, and thisbias contributes to unequal opportunity among genders or disproportionate eﬀects along political lines.Our results conﬁrm that homophilous and rich get richer dynamics in the graph itself play a critical rolein shaping the bias observed among multiple domains, paving the way for ﬁnding a common ground tocounteract observed disparities. As our theoretical results suggest and empirical results conﬁrm, thebias inside the tail or within the bulk of a popularity distribution can widely vary in orientation. Werefer to this as a chasm between seemingly opposing views, but explain that its causes are not alwaysin disparate treatment but may be simple systemic eﬀects of selective homophily. This observationis critical as previous predictions of algorithmic bias on the tail are sometimes diametrically opposedto the case when a similar metric is examined at the lower end, including when selecting items forfact-checking or choosing groups for targeted advertisement.To keep our model generally applicable, we focused on the most commonly found dynamics (op-portunity and rich-get-richer) which spans a range where popularity either plays no role or is entirelyresponsible for growth. This allowed us to identify the necessary and suﬃcient conditions for theobserved chasm to emerge, but that remains a crude unifying model that leaves many domain spe-ciﬁc eﬀects aside. We hope that our results encourage a renewed interest in a holistic view of eitherequitable representation or fairness guarantees for online content moderation. While each of thoseapplications is beyond the scope of this paper, the empirical presence of a chasm and our simulationsalready suggests that, in order to achieve this goal, a new analysis beyond a narrow focus on tail eﬀectsis critical. 19 cknowledgement

We would like to express our appreciation to Dr. Kiran Garimella and Prof. Dead Eckles fromMassachusetts Institute of Technology for their generous help with the collection of the WhatsAppdata. We would also like to thank Archis Chowdhury from BOOM for sharing his fact-checkingexperience with us.

References [1] David A Cotter, Joan M Hermsen, Seth Ovadia, and Reeve Vanneman. The glass ceiling eﬀect.

Social forces , 80(2):655–681, 2001.[2] Laurie A Morgan. Glass-ceiling eﬀect or cohort eﬀect? a longitudinal study of the gender earningsgap for engineers, 1982 to 1989.

American sociological review , pages 479–493, 1998.[3] Chen Avin, Barbara Keller, Zvi Lotker, Claire Mathieu, David Peleg, and Yvonne-Anne Pignolet.Homophily and the glass ceiling eﬀect in social networks. In

Proceedings of the 2015 conferenceon innovations in theoretical computer science , pages 41–50, 2015.[4] Zhaopeng Qu and Zhong Zhao. Glass ceiling eﬀect in urban china: Wage inequality of rural-urbanmigrants during 2002–2007.

China Economic Review , 42:118–144, 2017.[5] Ana-Andreea Stoica, Christopher Riederer, and Augustin Chaintreau. Algorithmic glass ceilingin social networks: The eﬀects of social recommendations on network diversity. In

Proceedings ofthe 2018 World Wide Web Conference , pages 923–932, 2018.[6] Matthieu Latapy, Clémence Magnien, and Nathalie Del Vecchio. Basic notions for the analysis oflarge two-mode networks.

Social networks , 30(1):31–48, 2008.[7] Stephen P Borgatti and Martin G Everett. Network analysis of 2-mode data.

Social networks ,19(3):243–270, 1997.[8] Albert-Laszlo Barabâsi, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, andTamas Vicsek. Evolution of the social network of scientiﬁc collaborations.

Physica A: Statisticalmechanics and its applications , 311(3-4):590–614, 2002.[9] Rashmi Pankajai Bomiriya. Topics in exponential random graph modeling. 2014.[10] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. In

Proceedings of the 20th international conference on World wide web , pages 675–684, 2011.[11] Vahed Qazvinian, Emily Rosengren, Dragomir Radev, and Qiaozhu Mei. Rumor has it: Identifyingmisinformation in microblogs. In

Proceedings of the 2011 Conference on Empirical Methods inNatural Language Processing , pages 1589–1599, 2011.[12] Minyoung Huh, Andrew Liu, Andrew Owens, and Alexei A Efros. Fighting fake news: Image splicedetection via learned self-consistency. In

Proceedings of the European Conference on ComputerVision (ECCV) , pages 101–117, 2018.[13] Yang Liu and Yi-Fang Brook Wu. Early detection of fake news on social media through prop-agation path classiﬁcation with recurrent and convolutional networks. In

Thirty-Second AAAIConference on Artiﬁcial Intelligence , 2018.[14] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. Fake news detection on socialmedia: A data mining perspective.

ACM SIGKDD explorations newsletter , 19(1):22–36, 2017.[15] Vanessa Wei Feng and Graeme Hirst. Detecting deceptive opinions with proﬁle compatibility. In

Proceedings of the Sixth International Joint Conference on Natural Language Processing , pages338–346, 2013. 2016] Mahmoudreza Babaei, Abhijnan Chakraborty, Juhi Kulshrestha, Elissa M Redmiles, MeeyoungCha, and Krishna P Gummadi. Analyzing biases in perception of truth in news stories and theirimplications for fact checking. In

Proceedings of the Conference on Fairness, Accountability, andTransparency , pages 139–139, 2019.[17] Zhi-Qiang You, Xiao-Pu Han, Linyuan Lü, and Chi Ho Yeung. Empirical studies on the networkof social groups: the case of tencent qq.

PLoS One , 10(7):e0130538, 2015.[18] Kiran Garimella and Dean Eckles. Images and misinformation in political groups: Evidence fromwhatsapp in india. arXiv preprint arXiv:2005.09784 , 2020.[19] Kiran Garimella and Gareth Tyson. Whatsapp, doc? a ﬁrst look at whatsapp public group data. arXiv preprint arXiv:1804.01473 , 2018.[20] Lada A Adamic, Bernardo A Huberman, AL Barabási, R Albert, H Jeong, and G Bianconi.Power-law distribution of the world wide web. science , 287(5461):2115–2115, 2000.[21] Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily insocial networks.

Annual review of sociology , 27(1):415–444, 2001.[22] Herbert Robbins and David Siegmund. A convergence theorem for non negative almost super-martingales and some applications. In

Optimizing methods in statistics , pages 233–257. Elsevier,1971.[23] Ana-Andreea Stoica, Jessy Xinyi Han, and Augustin Chaintreau. Seeding network inﬂuence inbiased networks and the beneﬁts of diversity. In

Proceedings of The Web Conference 2020 , pages2089–2098, 2020.[24] Fan Chung, Fan RK Chung, Fan Chung Graham, Linyuan Lu, Kian Fan Chung, et al.

Complexgraphs and networks . Number 107. American Mathematical Soc., 2006.[25] Noga Alon and Joel H Spencer.

The probabilistic method . John Wiley & Sons, 2004.[26] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart.

Concentration inequalities: Anonasymptotic theory of independence . Oxford university press, 2013.21

Proof of Theorem 5.1

Theorem 5.1.

Let { N ( M ∪ G, t, Θ } be a sequence of networks produced by the BGMG model.Assume that ρ ( p ) R , ρ ( p ) B > . The red group-size distribution G k ( R ) and the blue group-size distribution G k ( B ) asymptotically follow the power law distributions; speciﬁcally, as t goes to inﬁnity, G k ( R ) ∝ k − β ( R ) , G k ( B ) ∝ k − β ( B ) , (25)with β ( R ) = 1 + C R, and β ( B ) = 1 + C B, , where C R, := r (1 − η ) ξ − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η ) ρ ( p ) B ξ − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r , (26) C B, := (1 − r )(1 − η ) ξ − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r + r (1 − η ) ρ ( p ) R ξ − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) , (27)and α ∗ is the unique number ∈ (0 , satisfying α ∗ = rη + r (1 − η )( ξα ∗ + (1 − ξ ) r )1 − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η )( ρ ( p ) B ξα ∗ + ρ ( u ) B (1 − ξ ) r )1 − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r . (28) Proof.

We develop a recurrence for E ( G k,t ( R )) . First, deﬁne p RRt ( k ) := P ( a red member joins a red group with size k at time t ) , (29) p BRt ( k ) := P ( a blue member joins a red group with size k at time t ) . (30)By our construction of the model, it is easy to check that, p RRt ( k ) =( αr + (1 − α ) r ( E,M ) t ( R ))(1 − η ) ξ kt + (1 − ξ ) G t ( R )+ G t ( B ) − (1 − ρ ( p ) R ) ξr ( E,G ) t ( B ) − (1 − ρ ( u ) R )(1 − ξ ) G t ( B ) G t ( R )+ G t ( B ) , (31) p BRt ( k ) =( α (1 − r ) + (1 − α )(1 − r ( E,M ) t ( R )))(1 − η ) ρ ( p ) B ξ kt + ρ ( u ) B (1 − ξ ) G t ( R )+ G t ( B ) − (1 − ρ ( p ) B ) ξr ( E,G ) t ( R ) − (1 − ρ ( u ) B )(1 − ξ ) G t ( R ) G t ( R )+ G t ( B ) . (32)Note that a red group of degree k at time t + 1 could have arisen from three scenarios:1. at time t , it was a red group of size k , and no new member joins at time t + 1 ;2. at time t , it was a red group of size k − , and a new member joins at time t + 1 ;3. in the special case of k = 1 , a red group did not exist at time t can appear if a red person createsit.Therefore, E ( G k,t +1 ( R ) |F t ) = G k,t ( R ) (cid:0) − p RRt ( k ) − p BRt ( k ) (cid:1) (33) + G k − ,t ( R ) (cid:0) p RRt ( k −

1) + p BRt ( k − (cid:1) , (34)where F t is the σ -ﬁeld containing the information of the graph until time t . Note that p RRt ( k ) + p BRt ( k ) = A t ( R ) k + B t ( R ) t , (35)22 t ( R ) := ( αr + (1 − α ) r ( E,M ) t ( R ))(1 − η ) ξ − (1 − ρ ( p ) R ) ξr ( E,G ) t ( B ) − (1 − ρ ( u ) R )(1 − ξ ) G t ( B ) G t ( R )+ G t ( B ) (36) + ( α (1 − r ) + (1 − α )(1 − r ( E,M ) t ( R )))(1 − η ) ρ ( p ) B ξ − (1 − ρ ( p ) B ) ξr ( E,G ) t ( R ) − (1 − ρ ( u ) B )(1 − ξ ) G t ( R ) G t ( R )+ G t ( B ) , (37) B t ( R ) := ( αr + (1 − α ) r ( E,M ) t ( R ))(1 − η )(1 − ξ ) tG t ( R )+ G t ( B ) − (1 − ρ ( p ) R ) ξr ( E,G ) t ( B ) − (1 − ρ ( u ) R )(1 − ξ ) G t ( B ) G t ( R )+ G t ( B ) (38) + ( α (1 − r ) + (1 − α )(1 − r ( E,M ) t ( R )))(1 − η ) ρ ( u ) B (1 − ξ ) tG t ( R )+ G t ( B ) − (1 − ρ ( p ) B ) ξr ( E,G ) t ( R ) − (1 − ρ ( u ) B )(1 − ξ ) G t ( R ) G t ( R )+ G t ( B ) . (39)We then have E ( G k,t +1 ( R ) |F t ) = G k,t ( R ) (cid:18) − A t ( R ) k + B t ( R ) t (cid:19) + G k − ,t ( R ) A t ( R )( k −

1) + B t ( R ) t . (40)When k = 1 , taking the probability of a red group being created into consideration, we have E ( G ,t +1 ( R ) |F t ) = G ,t (cid:18) − A t ( R ) + B t ( R ) t (cid:19) + α · r · η + (1 − α ) · r ( E,M ) t ( R ) · η. (41)By lemma E.1, We can show that lim t →∞ A t ( R ) = C R, , lim t →∞ B t ( R ) = C R, , a.s, (42)where C R, := r (1 − η ) ξ − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η ) ρ ( p ) B ξ − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r , (43) C R, := r (1 − η )(1 − ξ ) η − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η ) ρ ( u ) B (1 − ξ ) η − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r . (44)By Lemma A.1, G k ( R ) has the following expressions: G ( R ) = rη C R, + C R, , G k ( R ) = G k − ( R ) ( k − C R, + C R, kC R, + C R, ∀ k ≥ . (45)This completes the proof for G k ( R ) , and we can use the same strategy for G k ( B ) , and show that G ( B ) = (1 − r ) ρ B η C B, + C B, , G k ( B ) = G k − ( B ) ( k − C B, + C B, kC B, + C B, ∀ k ≥ , (46)where C B, := (1 − r )(1 − η ) ξ − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r + r (1 − η ) ρ ( p ) R ξ − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) , (47) C B, := (1 − r )(1 − η )(1 − ξ ) η − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r + r (1 − η ) ρ ( u ) R (1 − ξ ) η − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) . (48)Using the same argument of the proof of[3, Theorem 4.12] completes the proof of the power lawresults. 23 emma A.1. [24, Lemma 3.1] Let ( a t ) , ( b t ) , ( c t ) be three sequences such that a t +1 = (cid:0) − b t t (cid:1) a t + c t , lim t →∞ b t = b > , and lim t →∞ c t = c . Then lim t →∞ a t /t exists and its value is lim t →∞ a t t = c b . (49) B Proof of Theorem 5.2

Lemma 5.2.

Let { N ( M ∪ G, t, Θ } be a sequence of networks produced by the BGMG model. The redmember-degree distribution M k ( R ) and the blue member-degree distribution M k ( B ) asymptoticallyfollow the power law distributions with the same power; speciﬁcally, as t goes to inﬁnity, M k ( R ) ∝ k − ( − α ) , M k ( B ) ∝ k − ( − α ) . (50) Proof.

For any k > , a red member of degree k at time t + 1 could have arisen from two scenarios:1. at time t , it was a red member of degree k , and not chosen at time t + 1 ;2. at time t , it has size k = 1 and chosen.Thus, E ( M k,t +1 ( R ) |F t ) = M k,t ( R ) (cid:18) − (1 − α ) · kt (cid:19) + M k − ,t ( R ) (cid:18) (1 − α ) · k − t (cid:19) . (51)When k = 1 , a red member of degree at time t + 1 could have arisen from:1. at time t , it was a red member of degree , and not chosen at time t + 1 ;2. a new member joins the network at time t . E ( M ,t +1 ( R ) |F t ) = M ,t ( R ) (cid:18) − (1 − α ) · t (cid:19) + α · r. (52)Therefore, M k ( R ) has the following expressions: M ( R ) = α · r − α , and M k ( R ) = M k − ( R ) (1 − α )( k − − α ) · k . (53)Hence, M k ( R ) ∝ k − ( − α ) . Exactly same argument holds for M k ( B ) . C Proof of Theorem 6.1

Theorem 6.1.

Assume the red member ratios for very small and large groups are smaller than theaverage red member ratio r in the network. There exist < k lowerA ≤ k upperA , such that• For k A > k upperA , r ( A ) ( k A ) < r ;• For k A < k lowerA , r ( A ) ( k A ) > r . Proof.

Under our assumption, there exists some < k lowerA ≤ k upperA , such that lim t →∞ r ( M,G ) k,t < r for k > k upperA and k < k upperA . Therefore, if k A > k upperA , for all groups where ads are placed, their limitingred member ratios are less than r . Consequently, we must have r ( A ) ( k A ) < r . On the other hand, if k A < k lowerA , for the groups where ads are not placed, their limiting red member ratios are less than r , which means that among all the people not seeing the ads, the red member ratio is less than r . Itfurther implies that among all the people seeing the ads, red member ratio is greater than r , that is, r ( A ) ( k A ) > r . 24 Proof of Theorem 6.2

Theorem 6.2.

Assume the red member ratios for very small and large groups are smaller than theaverage red member ratio r in the network. There exists < θ lower < θ upper < , such that• For θ > θ upper , r ( D ) ( θ ) > r ;• For θ < θ lower , r ( D ) ( θ ) < r . Proof.

Under our assumption, there exist some < k lowerF < k upperF , such that for k > k upperF and k < k upperF G k ( R ) G k ( R ) + G k ( B ) < r. As θ → , by the assumption (23), we see that lim θ → (cid:80) k ≤ k upperF G k ( R ) h ( k, θ ) (cid:80) k>k upperF G k ( R ) h ( k, θ ) = 0 , lim θ → (cid:80) k ≤ k upperF G k ( B ) h ( k, θ ) (cid:80) k>k upperF G k ( B ) h ( k, θ ) = 0 , (54)which implies that lim θ → r ( D ) ( θ ) /r k upperF ( θ ) = 1 , where r k upperF ( θ ) := (cid:80) k>k upperF G k ( R ) h ( k, θ ) (cid:80) k>k upperF ( G k ( R ) h ( k, θ ) + G k ( B ) h ( k, θ )) < r. (55)Hence we see that there exists θ lower > , such that r ( D ) ( θ ) < r for θ < θ lower .As θ → , by the assumption (23), we see that lim θ → (cid:80) k ≥ k lowerF G k ( R )(1 − h ( k, θ )) (cid:80) k r. (59)Consequently, there exists < θ upper < , such that r ( D ) ( θ ) > r for θ > θ upper . E Lemma E.1

Lemma E.1.

Under the assumption that ρ ( p ) R , ρ ( p ) B > , we have the following convergence results: • The proportion of edges coming from red members converges; that is lim t →∞ r ( E,M ) t ( R ) = r a.s. (60)• The ratio of red group counts over t converges; that is lim t →∞ r ( G ) t ( R ) = rη a.s. (61)25 The proportion of edges coming from red groups converges; that is lim t →∞ r ( E,G ) t ( R ) = α ∗ a.s. (62) where α ∗ is the unique number ∈ (0 , satisfying α ∗ = rη + r (1 − η )( ξα ∗ + (1 − ξ ) r )1 − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η )( ρ ( p ) B ξα ∗ + ρ ( u ) B (1 − ξ ) r )1 − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r . (63)We divide the proof into three parts. Part 1. Proof of (60)

Note that E ( M ) t ( R ) is the total degree of red nodes at time t . By ourmodel, given r ( E,M ) t ( R ) , the total degree of red nodes at time t + 1 could take two values: E ( M ) t ( R ) and E ( M ) t ( R ) + 1 , with probability − αr − (1 − α ) r ( E,M ) t ( R ) and αr + (1 − α ) r ( E,M ) t ( R ) respectively.Recall that F t is the σ − ﬁeld containing the information of the graph up to time t . Therefore we havethat E (cid:16) E ( M ) t +1 ( R ) |F t (cid:17) = E ( M ) t ( R ) + αr + (1 − α ) r ( E,M ) t ( R ) , (64)which gives E (cid:16) r ( E,M ) t +1 ( R ) − r |F t (cid:17) = t + (1 − α ) t + 1 ( r ( E,M ) t ( R ) − r ) . (65)Recall that our model starts from t = 2 . Therefore E (cid:16) r ( E,M ) t +1 ( R ) − r (cid:17) = t (cid:89) i =2 i + (1 − α ) i + 1 ( r ( E,M )2 ( R ) − r ) = O (cid:32) exp( − t (cid:88) i =2 αi + 1 ) (cid:33) = O ( t − α ) . (66)Next we show a concentration inequality for r ( E,M ) t ( R ) . For T > , we deﬁne a Doob martingale, thatfor ≤ t ≤ T , W t := E (cid:16) r ( E,M ) T ( R ) − r |F t (cid:17) = T − (cid:89) i = t i + (1 − α ) i + 1 ( r ( E,M ) t ( R ) − r ) . (67)It satisﬁes that { W t , ≤ t ≤ T } is a martingale, and W T = r ( E,M ) T ( R ) − r , W = E [ r ( E,M ) T ( R ) − r ] .Next we bound the diﬀerence between W t and W t − . We have that W t − W t − = T − (cid:89) i = t i + (1 − α ) i + 1 (cid:18) ( r ( E,M ) t ( R ) − r ) − t − αt ( r ( E,M ) t − ( R ) − r ) (cid:19) . (68)Since E ( M ) t ( R ) could just take two values E ( M ) t − ( R ) and E ( M ) t − ( R ) + 1 , we have that | r ( E,M ) t ( R ) − r ( E,M ) t − ( R ) | = O (1 /t ) . And thus W t − W t − = T − (cid:89) i = t i + (1 − α ) i + 1 O (1 /t ) = O (cid:32) exp( − t (cid:88) i = t αi + 1 ) (cid:33) O (1 /t ) . = O ( t − ( T /t ) − α ) . (69)Applying the Azuma’s inequality[25] for martingale, we get that there exist constants c , c > , suchthat for any T, x > , P ( | W T − W | > x ) ≤ exp (cid:32) − c x T − α (cid:80) Tj =1 t − α (cid:33) ≤ exp (cid:16) − c x T min(1 , α ) / log T (cid:17) , (70)where the last step is because T − α T (cid:88) j =1 t − α =  O ( T − α ) , if α < O ( T − log T ) , if α = 1; O ( T − ) , if α > . (71)26rom (70) we have that, for any (cid:15) > , the tail probability P (cid:16) | r ( E,M ) T ( R ) − E [ r ( E,M ) T ( R )] | > (cid:15) (cid:17) = P ( | W T − W | > (cid:15) ) is summable over T . By the Borel Cantelli lemma, we see that r ( E,M ) t ( R ) − E [ r ( E,M ) t ( R )] → a.s.,which gives our desired result with (66). Moreover, since we already show that W = O ( T − α ) we havethat there exist constants c , c > , for any x > , P ( | W T | > x ) ≤ P ( | W T − W | > x − | W | ) ≤ exp (cid:16) − c max( x − | W | , T min(1 , α ) / log T (cid:17) ≤ exp (cid:16) − c x T min(1 , α ) / log T (cid:17) . (72) Part 2. Proof of (61)

According to our model, at each time t , with probability αr ( E,M ) t ( R ) +(1 − α ) r a red member adds an edge, and with probability η the edge is added by creating a newred group. Let’s consider the number of red groups in the model conditioned on a given sequence { r ( E,M ) t ( R ) , t > } .For each t , there are two cases: (1) case 1, E ( M ) t +1 ( R ) = E ( M ) t ( R ) + 1 , in this case a red memberadds an edge at time t , and conditioned on { r ( E,M ) t ( R ) , t > } , the probability that this edge is addedby creating a new red group is η : this is because how this edge is added does not inﬂuence the valueof r ( E,M ) t +1 ( R ) and thus does not inﬂuence { r ( E,M ) t ( R ) , t > } , and hence whether we condition on { r ( E,M ) t ( R ) , t > } or not does not change the probability that the new edge is added by creating agroup; (2) case 2, E ( M ) t +1 ( R ) = E ( M ) t ( R ) , in this case a blue member adds an edge at time t , and no redgroup is created.We also have that, the events { a red group is created at time t } over diﬀerent t are independentconditioned on { r ( E,M ) t ( R ) , t > } . Intuitively, it is because the probability of { a red group is createdat time t } only depends on the value of E ( M ) t +1 ( R ) − E ( M ) t ( R ) . The independence claim could also beveriﬁed by writing out the posterior distribution of those events given { r ( E,M ) t ( R ) , t > } .Recall that our initial condition is that there is a red (blue) member with an edge to a red (blue)group, in total two members and two groups. Therefore, given { r ( E,M ) t ( R ) , t > } , the number of redgroups G t ( R ) satisﬁes that, G t ( R ) − follows a Binomial distribution B ( E ( M ) t ( R ) − , η ) . Therefore,by Hoeﬀding’s inequality ([26]), we have that for any x > , P (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G t ( R ) − E ( M ) t ( R ) − − η (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > x | { r ( E,M ) t ( R ) , t > } (cid:33) ≤ (cid:16) − E ( M ) t ( R ) − x (cid:17) , (73)which further implies that P (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − ηr ( E,M ) t ( R ) + η − t (cid:12)(cid:12)(cid:12)(cid:12) > x | { r ( E,M ) t ( R ) , t > } (cid:19) ≤ (cid:32) − x t E ( M ) t ( R ) − (cid:33) ≤ (cid:18) − x t t − (cid:19) . (74) Hence for any (cid:15) > , with probability 1 the tail probability P (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − ηr ( E,M ) t ( R ) + η − t (cid:12)(cid:12)(cid:12)(cid:12) > (cid:15) | { r ( E,M ) t ( R ) , t > } (cid:19) is summable over t . By the Borel Cantelli lemma, we see that r ( G ) t ( R ) − ηr ( E,M ) t ( R ) + ( η − /t goesto 0 a.s., which gives r ( G ) t ( R ) → rη a.s. with the fact that r ( E,M ) t ( R ) → r a.s..Moreover, since by the triangle inequality (cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − rη (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − ηr ( E,M ) t ( R ) + η − t (cid:12)(cid:12)(cid:12)(cid:12) + 1 − ηt + | ηr ( E,M ) t ( R ) − ηr | , (75)27e see that for any x > , (cid:110)(cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − rη (cid:12)(cid:12)(cid:12) > x (cid:111) ⊂ (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − ηr ( E,M ) t ( R ) + η − t (cid:12)(cid:12)(cid:12)(cid:12) > x − − ηt (cid:27) ∪ (cid:110) | ηr ( E,M ) t ( R ) − ηr | > x (cid:111) . (76)Therefore, for the unconditional tail probability of r ( G ) t ( R ) − rη , we have P (cid:16)(cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − rη (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ P (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − ηr ( E,M ) t ( R ) + η − t (cid:12)(cid:12)(cid:12)(cid:12) > x − − ηt (cid:19) + P (cid:18) | r ( E,M ) t ( R ) − r | ≥ x η (cid:19) . Note that the unconditional version of (74) also holds, since the right hand side does not depend on { r ( E,M ) t ( R ) , t > } . Together with (72), we have that there exists a constant c > , such that for any x > , P (cid:16)(cid:12)(cid:12)(cid:12) r ( G ) t ( R ) − rη (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ (cid:18) − x/ − (1 − η ) /t ) t t − (cid:19) + exp (cid:32) − c (cid:18) x η (cid:19) t min(1 , α ) / log t (cid:33) ≤ exp (cid:16) − c x t min(1 , α ) / log t (cid:17) . (77) Part 3. Proof of (62) and (63)

Recall that E ( G ) t ( R ) is the total degree of red groups. Similarto part 1, at each time t + 1 , E ( G ) t +1 ( R ) could take two values: E ( G ) t ( R ) and E ( G ) t ( R ) + 1 . By ourdeﬁnition of the model, one can verify that, the probability that E ( G ) t +1 ( R ) = E ( G ) t ( R )+1 is a function of r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B ) , which we denote by H ( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B )) ,and it takes the following expression H ( x, y, z, w ) := ( αr + (1 − α ) y ) η + ( αr + (1 − α ) y )(1 − η )( ξx + (1 − ξ ) zz + w )1 − (1 − ρ ( p ) R ) ξ (1 − x ) − (1 − ρ ( u ) R )(1 − ξ ) ww + z (78) + ( α (1 − r ) + (1 − α )(1 − y ))(1 − η )( ρ ( p ) B ξx + ρ ( u ) B (1 − ξ ) zz + w )1 − (1 − ρ ( p ) B ) ξx − (1 − ρ ( u ) B )(1 − ξ ) zw + z . (79)We already see that r ( E,M ) t ( R ) → r a.s. and r ( G ) t ( R ) → rη a.s. Similarly, r ( G ) t ( B ) → (1 − r ) η a.s. Wedenote F ( x ) = H ( x, r, rη, (1 − r ) η ) (80) = rη + r (1 − η )( ξx + (1 − ξ ) r )1 − (1 − ρ ( p ) R ) ξ (1 − x ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) + (1 − r )(1 − η )( ρ ( p ) B ξx + ρ ( u ) B (1 − ξ ) r )1 − (1 − ρ ( p ) B ) ξx − (1 − ρ ( u ) B )(1 − ξ ) r . (81)We have the following Lemma, whose proof is deferred to Appendix G. Lemma E.2.

Under the assumption that ρ ( p ) R , ρ ( p ) B > , F ( x ) satisﬁes1. F ( x ) has exactly one ﬁxed point, denoted α ∗ , in [0 , ;2. There exists γ < , such that for any x ∈ (0 , | F ( α ∗ ) − x | ≤ γ | α ∗ − x | . (82)Let α ∗ ∈ (0 , be the number satisfying that F ( α ∗ ) = α ∗ . Similar to part 1, we can calculate thesecond moment of α t +1 − α ∗ E (cid:0) ( α t +1 − α ∗ ) |F t (cid:1) = (cid:32) tr ( E,G ) t ( R ) t + 1 − α ∗ (cid:33) (1 − H ( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B )))+ (cid:32) tr ( E,G ) t ( R ) + 1 t + 1 − α ∗ (cid:33) H ( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B )) (83) = I (1) t + I (2) t + I (3) t , (84)28here I (1) t = t ( r ( E,G ) t ( R ) − α ∗ ) + 2 t ( r ( E,G ) t ( R ) − α ∗ ) (cid:16) (1 − α ∗ ) F ( r ( E,G ) t ( R )) (cid:17) − α ∗ (cid:16) − F ( r ( E,G ) t ( R )) (cid:17) ( t + 1) , (85) I (2) t = ( α ∗ ) (cid:16) − H ( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B )) (cid:17) ( t + 1) + (1 − α ∗ ) H ( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B ))( t + 1) , (86) I (3) t = 2 t ( r ( E,G ) t ( R ) − α ∗ ) (cid:16) (1 − α ∗ )∆( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B ))( t + 1) − α ∗ (1 − ∆( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B ))) (cid:17) ( t + 1) , (87)with ∆( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B )) := H ( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B )) − F ( r ( E,G ) t ( R )) . (88)We need the following lemmas. Lemma E.3.

Under the assumption that ρ ( p ) R , ρ ( p ) B > , there exists c > , such that for any x, y, z, w ∈ (0 , | ∆( x, y, z, w ) | < c ( | y − r | + | z − rη | + | w − (1 − r ) η | ) . (89)We ignore the proof of Lemma E.3, since it could be directly veriﬁed by checking that, the ﬁrstderivatives of H ( · ) are bounded. Lemma E.4.

We have that, lim T →∞ T (cid:88) t =1 | r ( E,M ) t ( R ) − r | + | r ( G ) t ( R ) − rη | + | r ( G ) t ( B ) − (1 − r ) η | t < ∞ , a.s. (90) and lim T →∞ T (cid:88) t =1 E [ | r ( E,M ) t ( R ) − r | + | r ( G ) t ( R ) − rη | + | r ( G ) t ( B ) − (1 − r ) η | ] t < ∞ . (91)The proof of Lemma E.4 is deferred to Appendix G.Next we bound I (1) t , I (2) t , I (3) t . For I (1) t , by Lemma E.2 and the fact that F ( α ∗ ) = α ∗ , we can havethat I (1) t = t ( r ( E,G ) t ( R ) − α ∗ ) + 2 t ( r ( E,G ) t ( R ) − α ∗ )( F ( r ( E,G ) t ( R )) − α ∗ )( t + 1) (92) ≤ ( r ( E,G ) t ( R ) − α ∗ ) (cid:18) − t (1 − γ )( t + 1) (cid:19) (93)For I (2) t , since H ( · ) is bounded by , obviously for some constant c > , we have I (2) t ≤ c ( t + 1) . (94)With the expression of I (3) t , it is easy to see that for some c > , | I (3) t | < c ∆( r ( E,G ) t ( R ) , r ( E,M ) t ( R ) , r ( G ) t ( R ) , r ( G ) t ( B ))) t . (95)29urther by Lemma E.3 and Lemma E.4, we have that lim T →∞ T (cid:88) t =1 I (3) t < ∞ a.s., and lim T →∞ T (cid:88) t =1 E [ I (3) t ] < ∞ . (96)We need the following Lemma, whose proof is deferred to Appendix G. Lemma E.5.

Let ( a t ) , ( b t ) , ( c t ) be three positive sequences such that a t +1 ≤ b t a t + c t , b t < , lim t →∞ (cid:81) ti =1 b i = 0 , and lim t →∞ (cid:80) ti =1 c i < ∞ . Then lim t →∞ a t = 0 . Let Z t = ( r ( E,G ) t ( R ) − α ∗ ) , a t = E ( Z t ) , b t = 1 − − γ ) t/ ( t + 1) , c t = E [ I (2) t + I (3) t ] . (97)By taking expectation in eq (84), we have that a t +1 ≤ b t a t + c t . It is direct to check the conditions b t < , lim t →∞ (cid:81) ti =1 b i = 0 . By (94) and (96), we have lim t →∞ (cid:80) ti =1 c i < ∞ . And thus from LemmaE.5 we know that lim t →∞ E ( Z t ) = 0 . (98)Since our goal is equivalent to show that Z t → a.s., we claim that it is enough to have that, Z t converges to a limit random variable almost surely as t → ∞ . This is because, assuming that lim t →∞ Z t exists a.s., since Z t is bounded, by the bounded convergence theorem, we have E (lim t →∞ Z t ) = 0 . Since Z t ≥ , its limit must be nonnegative, and therefore lim t →∞ Z t must equal 0 a.s., due to the fact thatits expectation is 0.Now we show that lim t →∞ Z t exists a.s., by checking that { Z t } is an almost supermartingle , sinceby[22], every almost supermartingle converges to a limit random variable almost surely. By[22], tomake { Z t } an almost supermartingle , we just need to check that lim T →∞ (cid:80) Tt =1 I (2) t + I (3) t < ∞ a.s. ,which we have already proved. Therefore the proof is ﬁnished. F Proof of Lemma 5.1

Lemma F.1.

We have that lim t →∞ r ( M,G ) k,t ( R, R ) = r ( M,G ) k ( R, R ) := 1 + (cid:80) kj =2 p RR,j k , (99) lim t →∞ r ( M,G ) k,t ( R, B ) = r ( M,G ) k ( R, B ) := 1 + (cid:80) kj =2 p RB,j k , (100) where p RR,j = p (0) RR,j p (0) RR,j + p (0) BR,j , p

RB,j = p (0) RB,j p (0) RB,j + p (0) BB,j (101) p (0) RR,j = r ξj + (1 − ξ ) /η − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) , (102) p (0) BR,j = (1 − r ) ρ ( p ) B ξj + ρ ( u ) B (1 − ξ ) /η − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r , (103) p (0) RB,j = r ρ ( p ) R ξj + ρ ( u ) R (1 − ξ ) /η − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) , (104) p (0) BB,j = (1 − r ) ξj + (1 − ξ ) /η − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r . (105)30 roof. We only prove the result for red groups. For a red group J R with size j at time t , Deﬁne theevents Γ t,j := { At time t, an edge between a member and J R is added } (106) Γ t,j,R := { At time t, an edge between a red member and J R is added } , (107) Γ t,j,B := { At time t, an edge between a blue member and J R is added } . (108)We then have that, by the deﬁnition of our model and Lemma E.1, P (Γ t,j,R ) · t = ( αr + (1 − α ) r ( E,M ) t ( R ))(1 − η ) (cid:18) ξj + (1 − ξ ) r ( G ) t ( R )+ r ( G ) t ( B ) (cid:19) − (1 − ρ ( p ) R ) ξ (1 − r ( E,G ) t ( R )) − (1 − ρ ( u ) R )(1 − ξ ) r ( G ) t ( B ) r ( G ) t ( R )+ r ( G ) t ( B ) → p (0) RR,j , where the convergence is for t → ∞ . Similarly, we have that P (Γ t,j,B ) · t → p (0) BR,j . (109)By the Bayes formula, we see that as t → ∞ , P (Γ t,j,R | Γ t,j ) = P (Γ t,j,R ) P (Γ t,j,R ) + P (Γ t,j,B ) → p RR,j . (110)We uniformly choose a red group J k,R at time t , among the red groups with size k . Deﬁne t < . . . < t k , such that t j is the time a new member M j joins the chosen group J k,R . By theconstruction of our model, we must have that t is the time the group is created, and the ﬁrst memberis of color red. For each j > , at t j this group has size j − . Note that as the graph size t goes toinﬁnity, since J k,R is uniformly chosen, we must have that t j → ∞ for each j . Therefore we have that E [ number of red members in J k,R ] → k (cid:88) j =2 p RR,j . (111)Recall that G k,t ( R ) is the number of red groups at time t . Since J k,R is uniformly chosen, we havethat E [ number of red members in red groups with size k ] G k,t ( R ) → k (cid:88) j =2 p RR,j , (112)which ﬁnishes the proof with the fact that r ( M,G ) k,t ( R, R ) = E [ number of red members in red groups with size k ] kG k,t ( R ) (113) Corollary F.1.

We have that, lim t →∞ r ( M,G ) k,t ( R ) = G k ( R ) r ( M,G ) k ( R, R ) G k ( R ) r ( M,G ) k ( R, R ) + G k ( B ) r ( M,G ) k ( R, B ) . (114)(115) Lemma F.2.

For the red member ratios within groups with size 1, and within groups with size goesto inﬁnity, we have: • For groups with size 1, lim t →∞ r ( M,G )1 ,t ( R ) = G ( R ) G ( R ) + G ( B ) = 1 + C B, + C B, C R, + C R, + C B, + C B, . (116)(117)31 For groups with size goes to inﬁnity, assume C R, < C B, , lim k →∞ lim t →∞ r ( M,G ) k,t ( R ) = r ( M,G ) ( R ) , (118) where r ( M,G ) is deﬁned as r ( M,G ) ( R ) = q RB q RB + q BB , (119) with q RB = rρ ( p ) R (cid:16) − (1 − ρ ( p ) B ) ξα ∗ − (1 − ρ ( u ) B )(1 − ξ ) r (cid:17) , (120) q BB = (1 − r ) (cid:16) − (1 − ρ ( p ) R ) ξ (1 − α ∗ ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) (cid:17) . (121) Proof.

Following Lemma F.1, we have that for r ( M,G )1 ,t ( R ) , since there is exactly 1 red (blue) memberin red (blue) group with size 1, so we have that r ( M,G )1 ,t ( R ) = G ,t ( R ) G ,t ( R ) + G ,t ( B ) → G ( R ) G ( R ) + G ( B ) . (122)For the case where k → ∞ , since we assume that there is a glass-ceiling eﬀect against red members,as k → ∞ , we have that G k ( R ) /G k ( B ) → . That is, we only need focus on blue groups.As j → ∞ , it is easy to check that lim j →∞ p RB,j = r ( M,G ) , (123)and consequently we have that lim k →∞ r ( M,G ) k,R = r ( M,G ) , (124)which ﬁnishes the proof. G Proofs of Axillary Lemmas

G.1 Proof of Lemma E.5

Proof.

It is enough to show that, for any (cid:15) > , there exists T > , such that a t < (cid:15) for all t > T . First,since c t is summable, we can ﬁnd T > , such that (cid:80) t>T c t < (cid:15)/ . Also, since lim t →∞ (cid:81) ti =1 b i = 0 ,we can ﬁnd a T > T , such that (cid:81) t − i = T +1 b i · ( a + (cid:80) i> c i ) < (cid:15)/ for all t > T . We claim that T isthe desired T . Without the loss of generality, in the rest we denote c = a . By induction, it is nothard to have the following expression for a t a t = t − (cid:89) i =1 b i c + t − (cid:89) i =2 b i c + t − (cid:89) i =3 b i c + . . . + c t − = t − (cid:88) s =0 t − (cid:89) i = s +1 b i c s . (125)We can further decomposition the summation on the right hand side into two parts, according to s ≤ T and s > T . Now, for any t > T , for the ﬁrst part, by our choice of T , and the fact that b i < , we have that T (cid:88) s =0 t − (cid:89) i = s +1 b i c s ≤ T (cid:88) s =0 t − (cid:89) i = T +1 b i c s = t − (cid:89) i = T +1 T (cid:88) s =0 c s < (cid:15)/ . (126)For the second part, by our choice of T and the fact that b i < , we simply have that t − (cid:88) s = T +1 t − (cid:89) i = s +1 b i c s ≤ t − (cid:88) s = T +1 c s < (cid:15)/ . (127)Combine the above two inequalities, with the fact that (cid:15) is arbitrary, we ﬁnish the proof.32 .2 Proof of Lemma E.4 Proof.

First, it enough to show (91), since if it holds, by the monotone convergence theorem, we have E (cid:34) lim T →∞ T (cid:88) t =1 | r ( E,M ) t ( R ) − r | + | r ( G ) t ( R ) − rη | + | r ( G ) t ( B ) − (1 − r ) η | t (cid:35) (128) = lim T →∞ T (cid:88) t =1 E [ | r ( E,M ) t ( R ) − r | + | r ( G ) t ( R ) − rη | + | r ( G ) t ( B ) − (1 − r ) η | ] t < ∞ , (129)which directly implies (90).We claim that, for a stochastic process { w t , t > } , in order to show that lim T →∞ (cid:80) Tt =1 E [ | w t | ] /t < ∞ , it is enough to have that, for some δ, c > , for any x > P ( | w t | > x ) ≤ exp (cid:16) − cx t δ (cid:17) . (130)It is because (130) implies that E [ w t ] = O ( t − δ/ ) , which makes E [ w t ] /t summable.By (77) and (72), we see that r ( E,M ) t ( R ) , r ( G ) t ( R ) satisﬁes the tail bound (130). Also r ( G ) t ( B ) satisﬁes, since it has the same behavior as r ( G ) t ( R ) . The proof is ﬁnished. G.3 Proof of Lemma E.2

Proof.

We deﬁne K ( x ) as ( F ( x ) − x ) (cid:16) − (1 − ρ ( p ) R ) ξ (1 − x ) − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) (cid:17) (cid:16) (1 − (1 − ρ ( p ) B ) ξx − (1 − ρ ( u ) B )(1 − ξ ) r (cid:17) . (131)By the deﬁnition of F ( x ) , it is easy to see that K ( x ) is a degree 3 polynomial, with a negative coeﬃcientfor x term. Therefore, lim x →−∞ K ( x ) = −∞ and lim x →∞ K ( x ) = ∞ . Since a degree 3 polynomialat most have 3 real roots, if we have K (0) > and K (1) < , then obviously K ( x ) has exact one rootin (0 , . Moreover, for x ∈ [0 , , since ρ ( p ) R , ρ ( p ) B > K ( x ) /F ( x ) > (cid:16) − (1 − ρ ( p ) R ) ξ − (1 − ρ ( u ) R )(1 − ξ )(1 − r ) (cid:17) (cid:16) (1 − (1 − ρ ( p ) B ) ξ − (1 − ρ ( u ) B )(1 − ξ ) r (cid:17) (132) > (1 − ξ − (1 − ξ )(1 − r )) ((1 − ξ − (1 − ξ ) r ) (133) = (1 − ξ ) r (1 − ξ )(1 − r ) ≥ , (134)which implies that F ( x ) − x and K ( x ) share the same sign in (0 , . Hence if K (0) > and K (1) < ,we have that F ( x ) − x has exact one root α ∗ in (0 , . Moreover, for x ∈ (0 , , F ( x ) − x < if x > α ∗ , F ( x ) − x < if x > α ∗ . This implies that F ( x ) − F ( α ∗ ) < x − α ∗ if x > α ∗ , and F ( x ) − F ( α ∗ ) > x − α ∗ if x < α ∗ , which leads to the fact that for x ∈ [0 , < (cid:12)(cid:12)(cid:12)(cid:12) F ( x ) − F ( α ∗ ) x − α ∗ (cid:12)(cid:12)(cid:12)(cid:12) < . (135)One can check that | F (cid:48) ( α ∗ ) | < . Taking supreme over x in the above inequality, since | ( F ( x ) − F ( α ∗ )) / ( x − α ∗ ) | is a continuous function, the supreme is achieved at some point x . If x ! = α ∗ , wecan set γ = | ( F ( x ) − F ( α ∗ )) / ( x − α ∗ ) | < ; if x = α ∗ , we can set γ = | F (cid:48) ( α ∗ ) | <1