The Rise and Fall of a Central Contributor: Dynamics of Social Organization and Performance in the Gentoo Community
Marcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone, Frank Schweitzer
MMarcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop
The Rise and Fall of a Central Contributor: Dynamics of So-cial Organization and Performance in the
Gentoo
Community
Marcelo Serrano Zanetti, Ingo Scholtes,Claudio Juan Tessone and Frank Schweitzer
Chair of Systems Design, ETH Zurich, Switzerland
Abstract
Social organization and division of labor crucially influence the performance of collabo-rative software engineering efforts. In this paper, we provide a quantitative analysis of therelation between social organization and performance in
Gentoo , an Open Source commu-nity developing a
Linux distribution. We study the structure and dynamics of collaborationsas recorded in the project’s bug tracking system over a period of ten years. We identify aperiod of increasing centralization after which most interactions in the community were me-diated by a single central contributor. In this period of maximum centralization, the centralcontributor unexpectedly left the project, thus posing a significant challenge for the commu-nity. We quantify how the rise, the activity as well as the subsequent sudden dropout of thiscentral contributor affected both the social organization and the bug handling performanceof the
Gentoo community. We analyze social organization from the perspective of networktheory and augment our quantitative findings by interviews with prominent members of the
Gentoo community which shared their personal insights.
An important prerequisite for the success of Open Source Software (OSS) projects is the abilityto build a sufficiently large and stable community of users and contributors. While actual sourcecode is typically contributed by a rather small and stable set of core developers, the wider -and possibly more diverse - community plays an important role in processes related to softwarequality management. Here, most OSS projects rely on a large number of part-time contributors who report bugs, triage pending bug reports or provide support and solutions for issues reportedby others. This community effort in handling bug reports does not only unburden developers;it also significantly increases software quality, thus bearing the potential to attract more users.Furthermore, in [17] it was argued that - despite their volatility - bug handling communitiesare a common entry point for long-term contributors who - after getting insight into a project’sorganizational and technical structures - may eventually become members of the core developerteam. As such, the structure and dynamics of bug handling communities is of particular im-portance for the success of OSS projects. In order to ensure a timely response to bug reports,1/21 a r X i v : . [ c s . S E ] F e b arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop the management of the project has to find efficient organizational structures and a reasonabledivision of labor, despite the fact that these communities are typically highly heterogeneous interms of dedication and skills.In this paper, we present a quantitative analysis of the structure and dynamics of the bughandling community of
Gentoo , an OSS project developing a
Linux distribution. Our study isbased on a data set covering more than , collaboration events recorded by the project’s Bugzilla installation over a period of more than ten years. The contributions of our study areas follows: • We study collaboration structures of the
Gentoo bug handling community by applyingquantitative measures that capture cohesion, centralization, clustering and communicationefficiency. Our analysis reveals a period of increasing centralization and decreasing cohesionthat resulted in a situation where most interactions in the community were mediated by asingle central contributor . • In the period of maximum centralization the central contributor unexpectedly left theproject. We analyze the implications for the project’s social organization, which include atemporary loss of cohesion as well as subsequent efforts to reorganize the community. • We complement our study by an analysis of the community’s performance in terms of bughandling efficiency and response time. Our findings suggest that the performance improvedduring the active period of the central contributor , while her retirement had a lastingnegative effect on bug handling efficiency and response time. • We substantiate our quantitative findings by personal insights into the social dynamicsof the
Gentoo community provided by three long-term contributors. These insights sup-port our findings and highlight potential applications of our quantitative measures in themonitoring of collaboration structures in OSS projects.The remainder of this paper is organized as follows. In section 2 we summarize relevant relatedwork studying the structure of OSS communities and its impact on performance. In section 3 weintroduce data collection and network analysis methods that form the basis of our case study. Insection 4 we present quantitative results on the evolution of the social organization, as well asbug handling performance in the
Gentoo community. We further interpret our findings, alignthem with personal insights shared by prominent community members and discuss threats tovalidity. Finally, in section 5 we summarize our contributions and highlight future research onthe application of network-based analysis methods in the management of software developmentcommunities. 2/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop
The question how the structure and dynamics of social organization influences the performanceand success of collaborative software development efforts has been studied by researchers fromdifferent fields using a variety of methods. Due to the availability of data, many of these studiesaddress OSS communities, which consist of users , developers and other contributors , who con-tribute to the project in terms of documentation, maintenance of web sites or the submission andhandling of bug reports. Members of such communities typically need to self-organize in a waythat guarantees information flow as well as a coordinated allocation of tasks and responsibilities.The processes and structures of this self-organization process have been studied in a number ofworks.Since it plays a central role in software quality assurance, bug handling communities have beenthe subject of many studies. Compared to the development of source code, in [8] it was foundthat the bug handling process is based on the contributions of a much wider community. In arecent work presented in [17], this community has further been shown to be an important entrypoint for long-term contributors and developers. As an important finding, lack of attention paidto bug reporters and fast negative feedback by the community decreases the likelihood for suchusers to contribute to the project for a long period. This is partly in line with arguments aboutthe negative impact of a too strict duplicate bug policy in bug handling communities put forthin [1].The collaboration structures emerging in bug handling communities can be extracted by differentmeans. Communication topologies of the bug handling communities of OSS projects hosted on SourceForge have been analyzed in [2]. Here it was shown that large projects - measured interms of the number of contributors - tend to have lower degrees of centralization in commu-nication. The authors further call for a detailed longitudinal analysis of changes in the socialorganization of OSS projects during periods of growth. Our work complements this study in thesense that we a) analyze the dynamics of centralization during a phase of growth in the
Gentoo community and b) show the impact of increasing centralization on community performance andcohesion.In addition to studies at the level of the community, the relationship between the network positionof contributors and their individual success (like e.g. the number of bug reports leading to bugfixes) has been studied in [5]. The authors find that both the centrality of contributors, aswell as their embedding in cohesive clusters of communication has beneficial effects on the bugfixing performance. A similar finding has been presented in [16], which studies the impact ofsocial aspects on individual performance in bug handling communities. Our paper complementsthis work in the sense that we study network-wide measures of communication efficiency and3/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop centralization, their dynamics during community growth, as well as their relation to the bughandling performance of the community. We further highlight potential risks associated withthe presence of central contributors in situations when these contributors leave the communityunexpectedly.The relationship between communication structures and success at the level of teams was studiedin [14]. Here it was shown that positive team performance is related to communication struc-tures that facilitate information dissemination. However, no clear relation between differences inthe coordination practices and the project success could be identified. In [15], the dynamics ofcollaboration structures of OSS communities has been studied. Similarly, in [3], co-ordinationpractices of the bug handling process have been studied for four OSS communities. The authorsfound that contributions are not distributed equally and that the community is organized in acore-periphery structure. Unequal division of labor and an increasing degree of centralization arecompatible with findings about the rise of a leader are presented in [7]. Here, a leader is defined asa contributor who consistently provides high quality contributions, co-ordinates efforts [11] andaround whom the community is centered [6]. Usually, leadership in OSS projects is shared be-tween several contributors. The analysis performed in [10] shows that overdependence on a leaderresults in an unstable situation where the project may accelerate - initially - its development,but which may end up saturating the leader.The present paper extends these previous works in the following way. First, we study the dynam-ics of a more comprehensive set of network measures that can be interpreted in terms of cohesion , centralization and communication efficiency . We particularly study how the social organizationof the Gentoo community evolves during an initial phase of growth and a subsequent phase ofincreasing centralization that is due to the presence of a central contributor. We then relate ourresults with proxies for community performance and study how both performance and social or-ganization are impacted by the loss of a central contributor. Finally, we interpret and substantiateour findings by means of insights from actual contributors to the
Gentoo community.
In our study of the dynamics of social organization in the bug handling community of
Gentoo ,we use the project’s installation of the
Bugzilla bug tracker as data source. We first describe ourprocess of retrieving data and extracting evolving collaboration networks. We then introduce thequantitative measures applied in our analysis of collaboration networks and briefly comment ontheir interpretation in the context of OSS projects. Furthermore, we summarize how we selectedthree community members in order to substantiate our findings by means of personal insightsfrom former and active contributors to the
Gentoo project.4/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop
In January 2002, the
Gentoo community started to use the
Bugzilla bug tracking system.The full history of all bug reports submitted since then are recorded in the database of theproject’s
Bugzilla installation. Data available for each of these bug reports include the historyof all updates to any field along with time stamps and the ID of the user who applied theupdate. In the context of our analysis, we particular extract the ID of the user who initiallysubmitted a bug report, as well as the time of the submission and the status of a bug report,like e.g. unconfirmed , pending , reproduced or resolved . For those bugs whose final status was setto resolved , we additionally collected the resolution field of the report, which can take one of thevalues fixed , duplicate , invalid , needinfo or wontfix . An entry fixed refers to those bugs for whichthe community eventually provided a fix. Bug reports whose resolution field was set to duplicate were identified to be duplicates of an existing bug report that refers to the same issue. Bugs withthe final resolution invalid are those that do not refer to actual software issues, instead referringfor instance to a misunderstanding on the user’s side. If a bug report is incomplete in the sensethat it lacks important information that would allow to reproduce or fix the underlying issueand if the reporting user fails to provide the necessary information within a certain time, the resolution field of a bug report is set to needinfo . Finally, the resolution of those bug reports thatare valid and complete, but that nevertheless cannot be fixed either due to a lack of resourcesor the fact that the issue is due to a external dependency are marked as wontfix . The fact thatall changes to the resolution field of a bug report as well as the submission of the bug reportitself are associated with a precise time stamp, further allows us to compute the number of bugsthat were submitted or resolved with a given status within a given period of time. In additionto all updates that relate to the resolution status of a bug, we also extracted the full history ofthe assignee and the cc fields of each bug report. The assignee field contains the ID of the userwho was made responsible for providing a solution for a particular bug report, while the cc fieldcontains a list of user IDs that are being notified about any future updates on a particular bug.All of the data were collected via the public API of the
Gentoo project’s
Bugzilla installation.In total, we retrieved data on , bug reports and , change events recorded betweenJanuary 1st 2002 and April 26th 2012. Some statistics of the data set, including the fraction ofresolved bugs falling in each of the aforementioned resolution categories are shown in Table 1. A core aspect of our study is the quantitative analysis of the collaboration structure of the
Gentoo community during particular periods of time. Even though our data set contains thefull record of updates to bug reports, for the construction of collaboration networks, we limit ourstudy to those update events that unambiguously capture dyadic social interactions between two5/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop contributors. In particular, for each addition of a user ID to the cc and assignee field of a bugreport, we infer a dyadic interaction between the contributor performing the change and the IDof the user that was added to the field. We further associate this interaction with the time stampof the associated update of the bug report. Focusing on updates to the cc and assignee fields ofbug reports necessarily provides a limited perspective on the social organization of a community.Nevertheless we decided to neglect additional data like e.g. the sequence of comments on bugs forwhich an inference of directed interaction networks is more difficult and error-prone. We ratherargue that the collaboration networks resulting from our construction procedure are neverthelessinsightful. The fact that a contributor A adds contributor B to the cc field of a bug indicatesthat A is aware of B and that A knows about the interests or competencies of B . Furthermore,the fact that contributor X adds Y to the assignee field of a bug report highlights that thesecontributors have different roles in the community, like e.g. X identifying the cause of an issueand assigning it to Y .Excluding those change events where contributors added themselves to the cc or assignee field,we infer more than , directed interactions between different members of the Gentoo community. The structure and dynamics of these interactions can be studied in terms of a collab-oration network in which nodes represent contributors and directed edges represent interactionsbetween them. A quantitative analysis of such network structures can reveal interesting insightsinto the community’s organization. Rather than aggregating all interactions occurring over aperiod of ten years, we further utilize the fact that all interactions inferred from our data setare time-stamped . In particular, we define a time window of days, filter out all interactionswhose time stamps are outside this time window and construct a network from all remaininginteractions (see an illustration of this procedure in Figure 1). Starting on the first day of theobservation period, we then progressively slide the start date of this time window by one dayincrements. This sliding window approach yields a sequence of , networks, each of themrepresenting the collaboration structures of the community within a day period starting ata particular day. By analyzing this sequence of networks, we obtain a time series of networkmeasures that capture the dynamics of social organization. It is important to note that the col-laboration networks obtained in the way described above are not necessarily connected, i.e. theymay consist of different disconnected components. In order to still provide a single measure thatcan be compared to previous and subsequent snapshots, we limit our analysis to the network’s largest connected component (LCC). We additionally measure the size of the LCC and indicateits relative size in terms of the fraction of all nodes that are connected to the LCC. In the following, we briefly introduce a number of quantitative, network-theoretic measures thatwe found to capture interesting aspects of the dynamics of social organization in the
Gentoo arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop
Table 1: Basic statistics of the
Bugzilla data set used for this study.Statistic
Gentoo igraph [4].
Closeness centralization
The first measure that we applied in our analysis is closeness cen-trality . The normalized closeness centrality of a node can be defined based on the inverse ofthe sum of the shortest path lengths to all other nodes in the network. As such, it captures thecentrality of a node in terms of how close it is to all other nodes in the network. Based on the dis-tribution of closeness centralities of all nodes, one can furthermore define the so-called closenesscentralization of a network. This network-wide measure captures the degree to which the topol-ogy is centralized . In a (maximally centralized) star network it takes a maximum value of whileit is for networks in which all shortest paths between all pairs of nodes have the same length7/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop (like e.g. a fully connected topology). In the context of our analysis, the closeness centralizationof a collaboration network captures to what degree contributors have the same importance forindirect information exchange. Precisely, in a network with maximum closeness centralizationall collaborations are mediated by a single individual, while in networks with smaller closenesscentralization community members have more equal roles.
Clustering coefficient
The clustering coefficient of a network measures how closely commu-nity members interact with each other in the sense that interactions between users A and B ,as well as between B and C will also entail a direct interaction between the users A and C .This property of a network can be quantified at the level of nodes by computing the fraction ofthose pairs of a node’s neighbors u and v that are connected by a direct link ( u, v ) . By averagingthe clustering coefficient scores of all nodes it is possible to measure the global clustering coeffi-cient of a network. In the context of our analysis, the (mean) clustering coefficient of a monthlycollaboration network captures how cohesive the community is in terms of contributors beingembedded in collaborating clusters. In other words, this measure captures to what extent twocollaborators also collaborate with other collaborators of their peers. Degree Assortativity
The degree assortativity of a node measures an individual’s preferenceto connect to peers that have a similar or different number of connections (degree). Networks inwhich nodes are preferentially connected to nodes with similar degree are called assortative. Apositive degree assortativity indicates a positive correlation between the degrees of neighboringnodes. Networks in which nodes are preferentially connected to nodes with different (i.e. smalleror higher) degree are called disassortative, in which case degree assortativity is negative. Zero degree assortativity means that there is no correlation between the degrees of connected nodes, i.e.nodes do not exhibit a preference for one or the other. In our analysis, we use degree assortativityto capture the contributors’ preference to collaborate with other contributors that are - from theperspective of the number of collaborations - of similar or different importance than themselves.
Algebraic Connectivity
An interesting aspect of network analysis is that the influence of anetwork topology on dynamical processes like e.g. information flow, cascading failures or synchro-nization phenomena can be captured by means of so-called spectral properties. One importantmeasure in this line is the so-called algebraic connectivity of a network. This scalar propertyparticularly captures whether the topology contains small cuts , i.e. whether all shortest pathsbetween different parts of the network pass through a small number of edges. The existence ofsuch small cuts is known to hinder information spreading and synchronization [13]. At the sametime, it can be seen as an indicator for robustness since it captures the effect of a failure of asmall number of nodes and associated links. The algebraic connectivity is defined as being the8/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop second smallest eigenvalue of the so-called Laplacian matrix, which is defined as the difference D − A between a diagonal matrix D in which the diagonal elements represent the degrees of nodesand the adjacency matrix A of the network topology. The algebraic connectivity of a networkis greater than 0 iff the network topology is connected, i.e. iff a path exists between all pairsof nodes. This is a corollary to the fact that the number of times appears as an eigenvalueof the Laplacian matrix is equal to the number of the network’s connected components. In thecontext of this paper we use algebraic connectivity it to measure the communication efficiencyand robustness of the the community’s collaboration structure. In order to substantiate our quantitative findings with insights into the community, we contacteda number of long-term contributors to the
Gentoo bug handling community. We received threevery insightful and detailed replies, which contain many details and serve as an external validationfor our quantitative findings. We omit the real names of the contributors and refer to them as
Alice , Bob and
Chris instead.
Alice was the - by far - most central contributor to the
Gentoo bug handling community in the period between October 2004 and March 2008. She was effectivelyhandling most of the bug reports, until she left the project suddenly in March 2008.
Bob wasinvolved in a community initiative to establish formal procedures regarding the submission andhandling of bug reports that were - in part - necessitated by the departure of
Alice . Chris is another long-term contributor to the project, second only to
Alice in terms of cumulativecontributions to the bug handling process. In our questionnaire, we asked for personal insightsregarding the following questions: • What was the impact of the central contributor
Alice on the involvement of other contrib-utors and project performance? • What were the reasons for
Alice leaving the project? • What was the motivation for the establishment of formal procedures for the bug handlingprocess? • Was this initiative successful in terms of improving the performance of the community? • What implications did the establishment of formal procedures have for the social organi-zation of the community? 9/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop
In the following, we study the dynamics of
Gentoo ’s bug handling community during the timebetween 2002 and 2012. Our methodology is based on a network-theoretic analysis of collabora-tion networks by means of the measures discussed in section 3.Based on the activity of the central contributor
Alice , we divide the observation period into threeperiods P , P , P . In period P , between January 2002 and October 27, 2004, Alice was notyet active and the community was growing. During the second period P starting on October 282004, Alice gradually became the most central contributor. She unexpectedly left the communityafter her last contribution on March 29 2008, which marks the start of the third period P inwhich Alice was not active anymore. In the following sections, we show how community cohesion,centralization and performance evolved in these three periods.
We first focus on the size of the largest connected component (LCC) of the respective monthlycollaboration networks. The relative size of the LCC (i.e. the fraction of all nodes belongingto the LCC) is shown in Figure 2(b). Since it captures how many of the contributors weredisconnected from the rest of the community, this measure can be seen as a proxy for the cohesion of the community. In Figure 2(b), period P is highlighted in green. As one can see,there is no significant difference between the periods P and P in terms of the relative sizeof the LCC; it rather remains stable around a value of . However, a remarkable dynamicscan be seen in period P after Alice had left the community: After a small drop, one observes asteady increase in the relative size of the LCC starting around the end of 2008. The relative sizeof the LCC eventually reaches around the end of 2011.Another cohesion -related measure is the average node degree in the monthly collaboration net-works, i.e. the average number of different community members, a contributor was collaboratingwith during one month. In Figure 2(d), one observes a fast decrease of this measure during pe-riod P , when the central contributor Alice was active. Remarkably, it was increasing during theperiods P and P , when Alice was not active.Apart from the relative size of the LCC, a further interesting question is how efficient and robustthe collaboration structures are within the largest connected component. For this, we computethe algebraic connectivity of the LCC, a measure from spectral graph theory that captures how well-connected a topology is. As argued in section 3, networks with larger algebraic connectivity a)facilitate information flow and synchronization processes and b) are more robust against the lossof nodes and links. The dynamics of algebraic connectivity is shown in Figure 2(e). Comparingperiod P to P and P , one observes that the presence of Alice decreased both the variance and10/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop(a) network size (b) relative size (c) relative size (without
Alice )(d) mean degree (e) algebraic connectivity (f) algebraic con. (without
Alice )(g) clustering coefficient (h) assortativity
Figure 2: Dynamics of size and cohesion of the
Gentoo bug handling community. Period P during which the central contributor Alice was active is highlighted in green.11/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop the mean of the algebraic connectivity. A straight-forward interpretation of this finding is that- as
Alice was involved in many of the collaborations - the collaboration network’s robustnessdecreased. Furthermore, as most collaborations were mediated through her, the potential ofcongestion in this particular node increased, thus effectively decreasing communication efficiencyof the topology.Another interesting question from the perspective of social organization is to what degree twocontributors that collaborated with a third contributor also collaborated with each other. Thisis captured by the clustering coefficient of a network, whose dynamics is shown in Figure 2(g).The dramatic decrease of the clustering coefficient during period P and the gradual increasein period P highlights the mediator role played by Alice . As
Alice was involved in most ofthe collaborations, direct connections between users collaborating with her seemingly becameunnecessary. Another signature of the community’s tendency to preferentially collaborate withthe most central collaborator can be seen in 2(h). As described in section 3, the assortativitycaptures the preference of contributors to collaborate with other contributors that are moreor less important than themselves. A significant decrease of assortavitity from about − . to − . can be seen in period P when Alice was active. This substantiates the assumptionthat most community members primarily collaborated with the most central collaborator whilecollaborations between contributors with similar importance decreased.A particular concern one may have in the analysis presented above is that it is unclear towhat extent it is the presence of
Alice that affects the dynamics of network measures. One maysuspect that it is the mere number of collaborations involving her that increasingly dominatethe community, while the existing collaboration structures are left more or less untouched. Inorder to avoid this pitfall, we additionally ran our analysis on all monthly collaboration networks,however removing
Alice as well as all interactions in which she was involved. We then computedthe relative size of the LCC and algebraic connectivity to the residual networks. Compared toFigures 2(b) and 2(e), a clear difference can only show up during period P , if Alice ’s presenceindeed impacted the residual collaboration structures. The plots of the relative size of the LCC(Figure 2(c)) and algebraic connectivity (Figure 2(f)) of the residual collaboration networkshighlight that the activity of
Alice during period P significantly changed the organization ofthe community. We particularly observe that - for the residual network - the fraction of usersconnected to the LCC dropped significantly from about to about over a period of twoyears. Furthermore, algebraic connectivity of the residual network experienced a significant drop,thus highlighting that during Alice ’s presence the residual collaboration topology became lesswell-connected.To visually illustrate the quantitative findings about the evolution of collaboration structuresprovided above, in Figure 3 we additionally show four representative examples for the monthlycollaboration networks during the periods P (Figure 3(a)), P (Figure 3(b)) and P (Figure12/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop
Alice from the network depicted in Figure 3(b).From our quantitative study of the evolution of collaboration structures in the
Gentoo community,we can draw the following observation:
Observation:
During the presence of the central contributor
Alice , cohesion in the
Gentoo bug handling community decreased. (a) October 2004 (b) July 2006(c) May 2008 (d) July 2006 (without
Alice ) Figure 3: Illustration of representative monthly collaboration networks13/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop
A particularly important mechanism that could explain the loss of cohesion in the community isan increasing centralization of communication. In this section we analyze the changes in central-ization in the
Gentoo community. We not only study centralization from a network perspective,i.e. the increase of the topological centrality of one particular node. We also consider the effects onthe number of contributors that were involved in the bug handling process in terms of assigningbug reports or forwarding information.We first analyze the degree of centralization in the
Gentoo community from the perspectiveof closeness centralization . As argued in section 3, this measure captures to what extent theroles of contributors differ in terms of having short paths to all other community members. Thedynamics of closeness centralization shown in Figure 4(a) exhibits a decreasing tendency duringthe period P . A comparison to the dynamics of community size during P (see Figure 2(a))highlights that the growth of the community coincided with a decrease in centralization, which isin line with the findings of [2]. However, the decrease in closeness centralization in period P wasfollowed by a significant increase during period P when Alice became active. From the start ofperiod P in October 2004 until the end in March 2008 closeness centralization increased fromabout . to . . When Alice left the community, closeness centralization experienced a suddendrop, fluctuating around a value of . during the period P .The finding that during period P the collaboration structures became more centralized is com-plemented by Figure 4(b), which shows the number of different contributors assigning a bugreport to another contributor within a given day period. This number is of particular in-terest, as it captures how many contributors were actually involved in the bug triaging processby assigning work to others. Again mirroring the increasing size of the community, in period P one observes an increase in the number different contributors assigning bug reports. At theend of period P in October 2004, about different contributors were assigning bug reports.This increase is followed by a decrease during period P , again coinciding with the activity of Alice . This development was only stopped in March 2008, after
Alice had left the project. Aftera sudden increase at the beginning of P , the number of different contributors assigning bugreports remained rather stable until 2011, when it experienced another increase.From the above analysis, we draw the following observation: Observation:
In the period where
Alice was active, centralization in the
Gentoo communityincreased steadily. arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop(a) closeness centralization (b) contributors assigning bugs
Figure 4: Centralization in the
Gentoo community. Period P during which the central con-tributor Alice was active is highlighted in green.
Apart from studying the evolution of collaboration structures, our data set further allows tostudy the bug handling performance of the
Gentoo community. As the simplest proxy forperformance, we measure the rate at which bugs were reported and resolved. We further studythe responsiveness of the community in terms of the median time to resolve a bug , i.e. themedian time elapsed from the submission of a bug report to the point where it was finallyresolved. Similarly, we measure the median time to the first response in terms of any updateto the submitted bug report, like e.g. the bug being forwarded or assigned, commented on, orits status being changed to reproduced. Figure 5(a) shows the dynamics of the median numberof bugs that were reported and resolved per day. During period P one observes a continuousincrease both in the number of reported and resolved bugs which coincides with the growth of the Gentoo community shown in 2(a). During period P , both the number of reported and resolvedbugs decreased, which can again be understood based on the decrease in the number of activecontributors shown in Figure 2(a). In both periods P and P , the rate of reporting and resolvingbugs closely match each other, thus indicating that - on average - the number of bugs resolvedper day matched the number of newly reported bugs. This lastingly changed after Alice had leftthe project. In period P one can observe an increasing discrepancy between the rate at whichbugs were reported and resolved, hence indicating a growing number of unresolved, pending bugreports. Furthermore, a remarkable increase in both the number of reported and resolved bugreports can be seen around March 2011, although the discrepancy between both remains. Thiscoincides with an increase in the number of active community members (see Figure 2(a)). Onepossible explanation is that it coincides with the Gentoo community having a regular
LiveDVD release. As it lowers the threshold of using the
Gentoo Linux distribution, this can explain an15/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop increasing number of contributors submitting bug reports, as well as the increase in the numberof different contributors assigning bug reports shown in Figure 4(b).Apart from the mere number of reported and resolved bugs, an important measure of performanceof bug handling communities is the time they take to provide a first response as well as a resolutionfor a reported bug. This responsiveness is of particular importance, as potential users frequentlyuse this as an indicator when making an informed decision about which software to adopt.Figure 5(b) shows the median time resolve and respond to a newly reported bug in days andhours respectively. Both numbers show a remarkable dynamics which coincides with the activityof the central contributor
Alice . During period P , the median time to resolve and respond to anewly submitted bug report was more than one order of magnitude smaller than in the periods P and P .From our analysis of bug handling performance, we thus draw the following observation: Observation:
During the presence of the central contributor
Alice , the bug handling performanceof the
Gentoo community increased significantly, while her retirement had a lasting negativeimpact. (a) daily activity (b) time to first reply and resolution
Figure 5: Bug handling performance in the
Gentoo community. Period P during which thecentral contributor Alice was active is highlighted in green.
We close this section by combining our quantitative results with personal insights shared bythree long-term contributors:
Alice , Bob and
Chris . By this, we substantiate our interpretation16/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop of Alice ’s role during period P and the consequences of her presence for the cohesion andperformance of the Gentoo community.In sections 4.1, 4.2, and 4.3, we point out that during period P the community experienceda significant loss of cohesion, as well as an increase of centralization and performance. We fur-ther argued that during period P , most of the collaboration was mediated by a small subsetof contributors, Alice herself being at the core of this group. Indeed, in response to our ques-tionnaire,
Alice describes that she “was practically the only person involved in bug wrangling”.
Bob confirms that
Alice “had been doing our bug wrangling more or less alone for a few years”.
Alice complements this picture by saying that the “workload at that time - [if I recall correctly]- was about 4 hours a day, probably more in case I did not have time to do the bug wranglingfor a day or two”. As a consequence of this centralization of bug handling tasks, during period P our analysis shows a significant increase in performance, measured in terms of response timeand bug resolution rate. This finding is confirmed by the community and Bob attributes it tothe fact that “having a single person on the task greatly helps in finding duplicate bug reports”.Furthermore, he argues that having “more [contributors] would water down the quality”.A further observation of our study is that the cohesion of the community (measured e.g. in termsof mean degree, clustering coefficient or algebraic connectivity) decreased significantly duringthe presence of
Alice . This is an interesting observation as it highlights secondary effects of thepresence of a central contributor on the evolution of collaboration structures within the remainingcommunity. Although it is necessarily difficult to make any substantiated claims about causality,one may conjecture that it is the mere presence and dedication of a central contributor thatdrives this loss of cohesion.
Bob indirectly confirms this by arguing that apparently “our bugtracker’s users had come to rely on a single person to “assist” them in finding and fixing bugs”.For the community, the retirement of
Alice was perceived as an unexpected event. According to
Bob , in 2008
Alice “suddenly left the project”. He further confirms that she “stopped unexpect-edly”. Clearly, one of the most interesting questions that cannot be answered by a quantitativestudy alone is why
Alice decided to leave the community. She answered our question for the un-derlying reasons as follows: “I would mostly attribute that to a serious loss of motivation causedby disruptive social environment in the project as a whole”. Moreover, she highlights her dissat-isfaction with “more and more time being spent on bureaucracy, "paperwork", and creating ofuseless structures within the project”. On the contrary,
Chris - another prominent contributor- remarks that “some people find formalization to be an unnecessary bureaucratic barrier, butwhen you get to be as big as Gentoo, it’s pretty much inevitable”.Independently of the reasons for
Alice ’s retirement, the risk of relying too much on a centralcontributor became obvious in a remarkable event during period P , when Alice was still active.In early 2007, according to her own account,
Alice was “repeatedly subject to [...] disciplinaryproceedings and [she] was suspended from the project for a couple of weeks” due to a verbal17/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop conflict with another contributor. Around this time, a sudden and short increase in the responsetime (see Figure 5(b)) as well as a decrease in closeness centralization (see Figure 4(a) can beobserved, thus serving as an early warning sign of the problems to come when
Alice would leave.Despite this early indicator, it was only after
Alice had left that the community took measuresto reorganize the community. In particular,
Bob initiated the
Bug Wranglers project, whicha) called for more contributors in bug handling and b) established formal procedures regardingthe tasks and goals of bug triaging . In response to our questions, Bob describes the projectas a success arguing that “the targets that relate to the content of bug reports are now usuallymet when serious bug wranglers review them”. However, despite this initiative, our finding of alasting negative impact on bug handling performance after the resignation of
Alice is confirmedby
Bob , saying that the “goal of responding to bugs within a day is still something to work on”.
We now discuss limitations of our analysis and highlight possible threats to validity. Since ourpaper is a case study focused on the
Gentoo community, we cannot make any claims aboutthe general applicability of our results. Even though our study as well as the feedback by thecommunity provide some interesting hints, we would further like to emphasize that we can-not make conclusive statements regarding the causal relation between increasing centralization,performance and cohesion. In particular, we cannot rule out external reasons driving both theincrease of centralization and the loss of cohesion in the community. Despite this disadvantage,we argue that our case study is interesting by itself, being a valuable addition to the literature onbenefits and risks of centralization in collaboration topologies. In order to validate our findings,we thus call for similar studies on OSS communities and other collaborative software engineeringprojects.Another possible concern is the choice of our network construction procedure as well as thechoice of length of the sliding window in our dynamic analysis. In order to only extract mean-ingful collaboration events and facilitated by the size of our data set, we only considered cc and assign collaborations. Nevertheless, it is clear that taking into account further relations, likee.g. comments, could possibly augment our perspective of collaboration topologies. At the sametime, we argue that - even though we have explored different sizes for the sliding window - wedid not see any qualitative change of our results. Eventually, we decided to include the resultsof a day window size, since this period is long enough to include collaborations of more oc-casional collaborators. At the same time, a one month period is short enough to not aggregatecollaborations occurring far apart in time. As such, our methodology of performing a dynamic See the website of the
Bug Wranglers project available online at arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop network analysis can be seen as a strength compared to the simpler approach of considered asingle time-aggregated network.
The main contributions of our paper are the following: • We study the dynamics of social organization and performance in the bug handling com-munity of
Gentoo . • We find a period in which the activity of a single contributor resulted in a significantincrease of centralization and performance. • Our analysis further shows that the period when the central contributor was active coin-cided with a significant decrease of cohesion. • We further find that the loss of the central contributor had a lasting negative impact onthe bug handling performance of the community.To the best of our knowledge, our paper is the first to quantitatively study how the rise and fallof a central contributor impact the social organization and performance of an OSS community.Even though the general statements that can be drawn from a case study are necessarily limited,we argue that our work highlights interesting directions for future research. We would like toemphasize that the quantitative measures used in our study allow to clearly identify shifts inthe social organization that are confirmed by insights by actual contributors. As such, we arguethat these measures can potentially be used in monitoring tools suitable to augment the socialawareness of community managers.
Acknowledgment
This work was supported by the SNF through grant CR12I1_125298 and by the European Com-mission’s Seventh Framework Programme FP7-ICT-2008-3 under grant agreement No. 231323(CYBEREMOTIONS). We acknowledge the contribution of Emre Sarigöl to data collection andprocessing and would like to thank the
Gentoo community members
Alice , Bob and
Chris forsharing personal insights with us. 19/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop
References [1]
Bettenburg, N., Premraj, R., and Zimmermann, T.
Duplicate bug reports consideredharmful really? In (2008),Ieee, pp. 337–345.[2]
Crowston, K., and Howison, J.
The social structure of Free and Open Source Softwaredevelopment.
First Monday 10 (2005).[3]
Crowston, K., and Scozzi, B.
Coordination practices within FLOSS development teams:The bug fixing process.
Computer Supported Acitivity Coordination (2004).[4]
Csardi, G., and Nepusz, T.
The igraph software package for complex network research.
InterJournal Complex Systems (2006), 1695.[5]
Ehrlich, K., and Cataldo, M.
All-for-one and one-for-all?: a multi-level analysis ofcommunication patterns and individual performance in geographically distributed softwaredevelopment. In
Proceedings of the ACM 2012 conference on Computer Supported Cooper-ative Work (New York, NY, USA, 2012), CSCW ’12, ACM, pp. 945–954.[6]
Evans, P., and Wolf, B.
Collaboration rules.
Harvard Bus. Rev. 7 (2005), 96–103.[7]
Giuri, P., Rullani, F., and Torrisi, S.
Explaining leadership in virtual teams: the caseof open source software.
Inform. Econ. Policy 20 (2008), 305–315.[8]
Mockus, A., Fielding, R. T., and Herbsleb, J. D.
Two case studies of open sourcesoftware development: Apache and mozilla.
ACM Transactions on Software Engineeringand Methodology 11 , 3 (2002), 309–346.[9]
Newman, M. E. J.
Networks: an introduction . Oxford Univ Press, 2010.[10]
Sadowski, B., Sadowski-Rasters, G., and Duysters, G.
Transition of governance ina mature open software source community: evidence from the debian case.
Inf. Econ. Policy20 (2008), 323–332.[11]
Scozzi, B., Crowston, K., Yeliz Eseryel, U., and Li, Q.
Shared mental modelsamong open source software developers. In
Hawaii International Conference on SystemSciences, Proceedings of the 41st Annual (2008), IEEE, pp. 306–306.[12]
Wasserman, S., and Faust, K.
Social Network Analysis: Methods and Applications .Cambridge University Press, 1994.[13]
West, D. B.
Introduction to Graph Theory , second ed. Prentice Hall, 2001.20/21 arcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer:The Rise and Fall of a Central Contributor:Dynamics of Social Organization and Performance in the
Gentoo
Communityto appear in the proceedings of the 6th International Workshop on Cooperative and Human Aspects ofSoftware Engineering (CHASE 2013) - ICSE 2013 Workshop [14]
Wolf, T., Schroter, A., and Damian, D.
Mining task-based social networks to explorecollaboration in software teams.
IEEE Software (2009), 58–66.[15]
Zanetti, M. S., Sarigöl, E., Scholtes, I., Tessone, C. J., and Schweitzer, F.
Aquantitative study of social organisation in open source software communities. In
ICCSW (2012), pp. 116–122.[16]
Zanetti, M. S., Scholtes, I., Tessone, C. J., and Schweitzer, F.
Categorizing bugswith social networks: A case study on four open source software communities. accepted forthe 35th ICSE - Software Engineering in Practice track, 2013.[17]
Zhou, M., and Mockus, A.
What make long term contributors: Willingness and op-portunity in OSS community.2012 34th International Conference on Software Engineering(ICSE)