[PDF] Does the h α index reinforce the Matthew effect in science? Agent-based simulations using Stata and R

Abstract

Recently, Hirsch (2019a) proposed a new variant of the h index called the h α index. He formulated as follows: "we define the h α index of a scientist as the number of papers in the h-core of the scientist (i.e. the set of papers that contribute to the h-index of the scientist) where this scientist is the α -author" (p. 673). The h α index was criticized by Leydesdorff, Bornmann, and Opthof (2019). One of their most important points is that the index reinforces the Matthew effect in science. We address this point in the current study using a recently developed Stata command (h_index) and R package (hindex), which can be used to simulate h index and h α index applications in research evaluation. The user can investigate under which conditions h α reinforces the Matthew effect. The results of our study confirm what Leydesdorff et al. (2019) expected: the h α index reinforces the Matthew effect. This effect can be intensified if strategic behavior of the publishing scientists and cumulative advantage effects are additionally considered in the simulation.

Full PDF

Does the h α index reinforce the Matthew effect in science? Agent-based simulations using Stata and R Lutz Bornmann * , Christian Ganser + , Alexander Tekles *+ , & Loet Leydesdorff $ * Division for Science and Innovation Studies Administrative Headquarters of the Max Planck Society Hofgartenstr. 8, 80539 Munich, Germany. Email: [email protected] + Ludwig-Maximilians-Universität Munich Department of Sociology Konradstr. 6 80801 Munich, Germany. $ University of Amsterdam Amsterdam School of Communication Research (ASCoR) PO Box 15793, 1001 NG Amsterdam, The Netherlands. Email: [email protected] 2

Abstract

Recently, Hirsch (2019a) proposed a new variant of the h index called the h α index. He formulated as follows: “we define the h α index of a scientist as the number of papers in the h-core of the scientist (i.e. the set of papers that contribute to the h-index of the scientist) where this scientist is the α-author” (p. 673). The h α index was criticized by Leydesdorff, Bornmann, and Opthof (2019). One of their most important points is that the index reinforces the Matthew effect in science. We address this point in the current study using a recently developed Stata command (h_index) and R package (hindex), which can be used to simulate h index and h α index applications in research evaluation. The user can investigate under which conditions h α reinforces the Matthew effect. The results of our study confirm what Leydesdorff et al. (2019) expected: the h α index reinforces the Matthew effect. This effect can be intensified if strategic behavior of the publishing scientists and cumulative advantage effects are additionally considered in the simulation. Key words bibliometrics, h index, h α index, Matthew effect, agent-based simulation, bibliometrics-based heuristics 3 Introduction

The h index introduced by Hirsch (2005) is one of the most-popular bibliometric indicators worldwide. The paper by Hirsch (2005) has been cited more than 3500 times (date of search in Web of Science, WoS, Clarivate Analytics: March 2019). The h index has been adopted as one among other indicators in WoS and Scopus (Elsevier). In the bibliometrics literature, however, many critical points have been raised about it: for example, Waltman and van Eck (2012) argue that “for the purpose of measuring the overall scientific impact of a scientist (or some other unit of analysis), the h-index behaves in a counterintuitive way. In certain cases, the mechanism used by the h-index to aggregate publication and citation statistics into a single number leads to inconsistencies in the way in which scientists are ranked” (p. 406). Furthermore, the counting of papers with citation numbers ≥h has not been justified by Hirsch (2005); it is equally possible to count papers with citation numbers ≥h or h/2 (Egghe, 2006a; Egghe, 2006b). Since the introduction of the h index many variants have been proposed targeting one or several disadvantages of the h index. Bornmann, Mutz, Hug, and Daniel (2011) concluded on the basis of a meta-evaluation that most of these variants correlate highly: “depending on the model, the mean correlation coefficient varies between .8 and .9. This means that there is redundancy between most of the h index variants and the h index” (p. 346). Recently, Hirsch (2019a) himself proposed a new variant called the h α index: “we define the h α index of a scientist as the number of papers in the h-core of the scientist (i.e. the set of papers that contribute to the h-index of the scientist) where this scientist is the α-author” (p. 673). Hirsch (2019a) recommended to use the new index in combination with the h index. The author formulated as follows: “a high h index in conjunction with a high h α ∕h ratio is a hallmark of scientific leadership” (p. 673). 4 The h α index was criticized by Leydesdorff et al. (2019). One of their most important critical points is that the index “adds the normative element of reinforcing the Matthew effect in science” (p. 1163). The Matthew effect was defined by Merton (1968) as follows: “the Matthew effect consists in the accruing of greater increments of recognition for particular scientific contributions to scientists of considerable repute and the withholding of such recognition from scientists who have not yet made their mark” (p. 58). Merton (1968) cites a physicist as follows: “The world is peculiar in this matter of how it gives credit. It tends to give the credit to (already) famous people” (p. 57). The Matthew effect is very similar to Price’s (1976) “cumulative advantages” that he noted as a core mechanism in the sciences explaining, among other things, the skewed distributions in the indicator values. Barabási (2002) reinvented Price’s cumulative advantages and Merton’s Matthew effect as “preferential attachment” without any knowledge of or reference to this bibliometric literature (Bonitz, Bruckner, & Scharnhorst, 1999). Hirsch (2019b) partly denied that the h α index reinforces the Matthew effect in science: “Strictly speaking at most half of this is true, the higher h-index author in a collaboration benefits, however the lower h-index author does not get negatively affected, his/her h α remains the same. More importantly, lower h-index authors have the choice to not collaborate with high h-index authors but rather pursue their own independent work, or work with more junior collaborators” (p. 1168). We agree with Hirsch (2019b) that (probably) authors or co-authors with low h index values will not become “poorer” and nothing is taken away from them. The first problematic point in his statement is, however, the implicit demand to search strategically for cooperation in science. Following the norms in the ethos of science (Merton, 1942, 1973), cooperating partners should be selected based on the quality of their research or the fit to the needed expertise for a certain research project, but not for non-scientific reasons such as the increase of indicator values. Supervisor-supervised relationships might be other reasons for co- 5 authorship besides h index values that counteract a co-author selection based on purely scientific reasons (however, which can scarcely be avoided in science). Since Hirsch (2019b) partly rejected our claim, the second problematic point in his statement above is the remaining uncertainty about the reinforcement of the Matthew effect by using h α in research evaluation. Thus, we address this point in the current study. We used a recently developed Stata command (h_index) and R package (hindex), which can be used to simulate h index and h α index applications in research evaluation. Based on fictitious data the user can empirically investigate whether h α reinforces the Matthew effect or not. Literature overview and conceptual roots

The role of simulations in scientometrics

Albeit that simulations are not in the focus of the bibliometric literature, both bibliometrics and simulation studies have been used as quantitative methods in quantitative science and technology studies (e.g., Ahrweiler, 2001; Edmonds, Gilbert, Ahrweiler, & Scharnhorst, 2011; Scharnhorst, Börner, & van den Besselaar, 2012). Gilbert (1997) set the stage with the first simulations of the structure and dynamics of academic science. He introduced “kenes” as knowledge-variants of genes; the resulting events showed Lotka-type distributions and were interpretable using Simon’s (1957) models of social processes. Ahrweiler – in collaboration with two co-authors – developed a large innovation model called SKIN: “Simulating Knowledge Dynamics in Innovation Networks” (Ahrweiler, Pyka, & Gilbert, 2004, 2011). Different from data-oriented studies, simulations enable us to theorize mechanisms and to specify expectations. Not only observable behavior but also coordination and selection mechanisms can be studied. Leydesdorff and den Besselaar (1998), for example, showed that the Cobb-Douglas production function can be elaborated into a representation of technological trajectories and technological regimes by assuming feedback mechanisms 6 (Leydesdorff & van den Besselaar, 1994). In a similar vein, one can simulate lock-ins and deadlocks in technological innovation (Leydesdorff, 2001; Leydesdorff & van den Besselaar, 1998) and synergy in Triple-Helix models (Ivanova & Leydesdorff, 2014). In the confrontation with data, the insights in mechanisms can be developed into what Bornmann and Marewski (in press) further elaborated into bibliometrics-based heuristics (BBH, see section 2.3). During the early 2000s, this focus on the content of science and technology in more abstract (knowledge-based) terms disappeared because of the popularity of agent-based modeling in neighboring disciplines (Edmonds, Hernandez, & Troitzsch, 2007; Tesfatsion, 2002). Leydesdorff (2015) argued for a focus on (genotypic) mechanisms instead of phenotypical behavior. From this perspective, the observable dynamics of the sciences can be studied evolution-theoretically (Campbell, 1991; Distin, 2010; Hodgson & Knudsen, 2011; Ionescu & Chopard, 2013; Popper, 1972). Meyer, Lorscheid, and Troitzsch (2009) provide a bibliometric analysis of the first decade of the

Journal of Artificial Societies and Social Simulations (JASSS). The Matthews effect itself has extensively been simulated (for example, in physics) under the heading of preferential attachment (Abbasi, Hossain, & Leydesdorff, 2012; Barabási, 2002; Barabási et al., 2002; Bonitz et al., 1999; Garavaglia, van der Hofstad, & Woeginger, 2017; Newman, 2001; Petersen et al., 2014). In a recent study, Backs, Günther, and Stummer (2019) used agent-based modeling as decision support system when planning measures to encourage academic patenting within universities. The authors suggest “the application of agent-based modeling and simulation, an approach that has been successfully used in other, similar, contexts (e.g., when selecting useful measures for market introduction and diffusion of new products). We have presented herein an agent-based model that is suitable for this purpose, and we have demonstrated its applicability and its potential value for practice (i.e., TTO [technology transfer offices] 7 management drives increased patenting) and subsequently for society (i.e., more academic patents lead to an increase in knowledge transfer between universities and industry and/or provide a basis for spin-off companies) by means of an application example” (p. 454). You, Han, and Hadzibeganovic (2016) used an agent-based simulation model to assess how the impact of scientists’ work efficiency and their capability to select important topics for their research affects the h index (and other measures). In this simulation model, the agents (authors or research teams) try to occupy nodes in a citation network (publications). By providing the citation network a priori, the simulations focus on the process of competing for possible publications, rather than the collaboration or the citation process. The model proposed by You et al. (2016) is an example of how the influence of individuals’ actions on macro-level patterns can be analyzed by means of simulations in scientometrics. Besides that, we are aware of only a few simulation studies in scientometrics which focus on the h index. These simulations – as a rule – have dealt with the development of single h index values without considering collaborations between scientists. Lobet (2016) published an h index evolution simulator which reveals the development of single h index values based on various inputs (e.g., starting year of publishing, papers per year). The simulator is able to consider certain behaviors of researchers, for example, to always cite own papers. Guns and Rousseau (2009) investigated the h index’s growth based on computer simulations of publication and citation processes. They found that “in most simulations the h-index grows linearly in time. Only occasionally does an S-shape occur, while in our simulations a concave increase is very rare” (p. 410). Ionescu and Chopard (2013) published two agent-based models which refer to performance measurements of single scientists and a group of scientists (see also Żogała-Siudem, Siudem, Cena, & Gagolewski, 2016). They studied, for example, what happens when low h-index researchers are removed from a community. Their results suggest “a stratified structure of the scientific community, in which the lower h levels mostly cite papers from the upper h levels” (p. 426). 8

Analytical sociology

This study follows the approach of analytical sociology which focusses on the mechanisms leading to social phenomena (Hedström, 2005; Hedström & Ylikoski, 2010). It is the goal of analytical sociology to work out the mechanisms (on the micro level) which are the causes of the phenomena (on the macro level) (Bornmann, 2010). In this study, we are interested whether the phenomenon “Matthew effect” can be produced by the mechanism “h α index”. In our simulation, the action is on the micro-level, since action (publishing, being cited, collaborating, and performance measuring) is done on the single-agent level, and the possible outcome is on the macro-level – structures in the form of certain h α index distributions. In order to test the relationship between mechanism and phenomenon in this study, several agent-based simulations have been performed by using the Stata h_index command (and the corresponding R package). Most of the model parameters are held constant over all simulations; compared to a baseline simulation, only one parameter is changed in each of another three simulations to inspect the effect of this parameter. The interested reader of this paper can use the command or package to investigate the effect of further parameter variations. Bibliometrics-based heuristics

The h_index command and the hindex package can be used to define rules for running various simulations. For example, we work with certain distributions of h index values as starting points and define how the agents in the simulation interact. The simulations are used then to receive an experimental view on the effects of the h α index use in research evaluation. Recently, Bornmann and Marewski (in press) introduced BBHs. They discuss the use of bibliometrics in research evaluation against the backdrop of the fast-and-frugal research program (e.g., Gigerenzer, Todd, & ABC Research Group, 1999) and define BBHs as decision strategies in research evaluation which ignore many, but use limited information 9 (data) about an entity (i.e., citation and publication data of a researcher) to assess the entity. The application of heuristics in many other environments, for instance business, medicine, sports, and crime (Gigerenzer & Gaissmaier, 2011) has shown that they come to similar good judgments than more complex decision strategies. By following the fast-and-frugal research program, Bornmann and Marewski (in press) define for the use of BBHs some search, stopping, and decision rules. These rules help to define and apply BBHs for a certain research evaluation environment. For example, these rules can be defined as follows: In economics, publications in so-called top-five journals ( American Economic Review , Econometrica , Journal of Political Economy , Quarterly Journal of Economics , and

Review of Economic Studies ) decide about scientific careers (Bornmann, Butz, & Wohlrabe, 2018); reaching a professorship without having published in these journals is frequently not possible. The search, stopping, and decision rules for filling a professorship can be defined as follows: (1) search for all publications of a group of candidates (economists); (2) stop search when all publications have been identified; (3) select the candidate with the most papers in top-five journals. One important element in the fast-and-frugal research program is the investigation of heuristics in certain environments: do they come to reliable and valid judgements? Is the application of certain bibliometric indicators in the environment reasonable? Does the indicator’s use lead to non-desired effects? In this paper, we follow the BBH approach by studying the possible advantages and disadvantages of the h α index use in research evaluation. We especially focus on the assumed sensitivity of the h α index for the Matthew effect. Implementation of our simulation model in Stata and R

The ado h_index and the hindex package simulate agents who collaborate on publishing papers. In Stata, type net install h_index, from(https://raw.githubusercontent.com/chrgan/h_index/master/) to 10 install the ado. The R package hindex can be installed by typing devtools::install_github("atekles/hindex") . The simulation procedure is as follows: (1) As a starting point, n agents are generated. The user can specify n , the number of agents. The agents have published in the past. The user can choose between a Poisson or negative binomial distribution for the number of previously published papers and set parameters of the distribution. It is assumed that each paper has been written one to five periods ago (imagine years, for example). For a share of these papers, which the user can specify interactively, the agent is the alpha author. (2) For this set of n agents, the h index and h alpha index are calculated. (3) Then, the agents start to collaborate. The previously published papers might be the result of collaborations among the simulated agents or with other agents. This does not matter for the rest of the simulation procedure. The user can specify how many periods the agents collaborate. In each period, the agents form teams publishing new papers. The user can set some properties: the average number of co-authors, the share of agents who collaborate in each period, and the correlation between the probability of co-authoring with other agents and the h index values calculated in step 2. Thus, one can specify that agents with high initial h index values are more productive than agents with low initial h index values. By default, the collaborating agents are assigned to co-authorships at random. However, it is possible to specify that agents with high h index values avoid co-authorships with agents who have equal or higher h index values. In this case, the agents with high h index values strategically select co-authors to improve their h α . The Stata module moremata has to be installed in advance (Jann, 2005). The package devtools has to be installed for this option to install the hindex package.

11 (4) All papers receive citations each period. The number of citations depends on (a) the citation distribution and (b) the age of the paper. (a) The user can choose between a Poisson and a negative binomial distribution. He/she can specify the maximum expected number of citations. (b) The expected number depends on the papers’ age following a log-logistic function. It first increases with time periods, reaches the maximum specified in step 4(a) after a configurable number of periods and then decreases. The steepness of the log-logistic function can be specified, too. Thus, for each given age of the papers, the number of citations follows the distribution specified in step 4(a) with an expected citation number given by its maximum and the age of the paper. A graph showing the distribution of the expected values can be generated. To reflect the possibility of self-citations, the user can specify an option leading to one additional citation for each paper (published at least one period ago) where at least one of its authors has an h index value which exceeds the number of previous citations of the paper by one or two. This reflects agents strategically citing their own papers which have citations just below their h index value. This accelerates the growth of the agents’ h index values. Finally, a “boost” effect can be specified: papers of agents with higher h index values are cited more frequently than papers of agents with lower h index values. (5) For each period, the new values of h and h alpha are calculated. The alpha author of a paper can be determined at the time of its publication (without changing later on) or the alpha author of a paper is determined after each period of action based on the current h index values of the authors (see Tietze, Galam, & Hofmann, 2019). (6) To ensure the robustness of the results, steps 1 to 5 are repeated r times. 12 Results

The Matthew effect implies that the more reputable scientist receives more credit than the less reputable scientist for a scientific contribution, although the contribution is of the same scientific quality. Thus, the credit is not attributed fairly on the basis of the performed contribution, but (unfairly) on the basis of previous contributions. If we compare this definition of the Matthew effect with the definition of the h α index, the similarities are obviously observable. In case of the h α index, the credit for a paper is assigned to the co-author with the highest h index. Although all authors conributed to the co-authored paper in question, only one author receives the full credit. Furthermore, the credit is assigned to the co-author who is most reputable in terms of h index values. These similarities between the definitions of Matthew effect and h α already point out that the simulations which are presented in the following can be expected to reveal the appearance of the Matthew effect by using the h α index in performance measurement. First agent-based simulation with 200 agents (baseline simulation)

Similar to the BBHs program with search, stopping, and decision rules (see above), the first agent-based simulation has three phases: initial setting, acting (collaborating) several periods, and final dataset for further analysis (visualization of the results). Whereas the initial setting and the final dataset is on the macro level (certain distributions are set or analyzed), acting is on the micro level (see section 2.2). It is the goal of the first agent-based simulation – the baseline simulation, compared to which one parameter is changed in each of the simulations presented in the following sections – to compare the mean h α index values of agents with initial low or high h index values after several periods of action (e.g., collaboration with other agents). The Stata command for the first agent-based simulation is 13 h_index, r(50) n(200) per(20) co(3) dp(poisson, mean(10)) dc(poisson, mean(5)) p(3) sh(.33) clear . Initial setting : The first simulation is based on 200 agents [ n(200) ]. The agents in the groups have published on different output and impact levels: the distribution of the papers follows a Poisson distribution, the agents have published 10 papers on average [ dp(poisson, mean(10)) ]. For 1/3 of all papers published by an agent, the agent itself is the alpha-author (-agent) [ sh(.33) ]. h index and h α index values are calculated for all agents. Acting : Agents act (publish, collaborate, receive citations) across 20 periods [ per(20) ]. Each collaborating group of agents has three agents on average [ co(3) ]. The citations, which the co-authored papers published by the agent groups receive, follow a Poisson distribution with a specified time-dependent expected value [ dc(poisson, mean(5)) ]. The time-dependent expected value follows a log-logistic distribution reaching its maximal value of 5 after 3 years (following the general guideline by Glänzel & Schöpflin, 1995) [ p(3) ]. The agent-based simulation is repeated 50 times [ r(50) ] to ensure the robustness of the simulation.

After each simulation, new h index and h α index values are calculated for all agents. The equivalent function call to produce the simulated data in R is: simulate_hindex(runs = 50, n = 200, periods = 20, coauthors = 3, distr_initial_papers = 'poisson', dpapers_pois_lambda = 10, distr_citations = 'poisson', dcitations_mean = 5, dcitations_peak = 3, alpha_share = .33)

14 Figure 1. Results of the first agent-based simulation: mean h α index values for two groups of agents with low (<7) and high (>7) initial h index values Final dataset : Two groups of agents are refined with low (<7) and high (>7) initial h index values (7 is the median initial h index value). For each period with actions (20 in total), the mean h α index values are computed (across 50 repetitions of the simulation to have robust results). The results are shown in Figure 1. For each period with actions, the advantage of the agents with high initial h index values is clearly visible: they do not only start with higher mean h α index values (which is as expected), these values also increase with additional periods – by considering further cooperation, publications and additional citations. The mean h α index values of the agents with low initial h index values also increase over time. However, the difference between the two groups becomes larger with onward periods – as the green line in Figure 1 demonstrates. Increasing differences between both groups can be interpreted as a Matthew effect in operation. 15 Second agent-based simulation with an additional element leading to more citations for prolific agents

The second simulation has been run using the Stata command h_index, r(50) n(200) per(20) co(3) dp(poisson, mean(10)) dc(poisson, mean(5)) p(3) sh(.33) boost(size(.5)) . It is the same command as in the first agent-based simulation (the baseline simulation), but we introduce a new element with boost(size(.5)) (which is printed in boldface). This option means that papers published by agents with higher h index values are cited more frequently than papers published by agents with lower h index values. The results of Frandsen and Nicolaisen (2017) demonstrate, for instance, that “authors in the field of Healthcare tend to cite highly cited documents when they have a choice. This is more likely caused by differences related to quality than differences related to status of the publications cited” (p. 1278). The number of citations in the second simulation are increased based on the value specified with [ size(.5) ]. For example, suppose agents with a maximal h index value of 11 have published a certain paper. The value .5 as option means that this paper receives round(11*.5)=6 additional citations.

Final dataset : In the second agent-based simulation, the median of the initial h index values (median=7) is the same as in the first simulation. Thus, two groups of agents are refined with low (<7) and high (>7) initial h index values. Figure 2 presents the results. The results are similar to Figure 1, but the differences between both groups are more pronounced: whereas the h α index values of the group with high initial h index values increase more steeply, the h α index values of the group with high initial h index values increase similar to those in Figure 1. This leads to larger mean h α index values differences between both groups The equivalent function call to produce the simulated data in R is: simulate_hindex(runs = 50, n = 200, periods = 20, coauthors = 3, distr_initial_papers = 'poisson', dpapers_pois_lambda = 10, distr_citations = 'poisson', dcitations_mean = 5, dcitations_peak = 3, alpha_share = .33, boost = TRUE, boost_size = .5)

16 (as the green line reveals). In other words, the Matthew effect is reinforced by letting the papers published by agents with higher h index values be cited more frequently than the agents with lower h index values. Figure 2. Results of the second agent-based simulation: mean h α index values for two groups of agents with low (<7) and high (>7) initial h index values Third agent-based simulation considering the correlation of new citations with h index values: agents with high h index values receive disproportional citations

For the third simulation, the following Stata command has been used: h_index, r(50) n(200) per(20) co(3) dp(poisson, mean(10)) dc(poisson, mean(5)) p(3) sh(.33) dil(correlation(.8) share(.6)) . Similar to the second simulation, only one option has been changed (which is printed in boldface) in comparison to the first baseline simulation. The new options [ dil(correlation(.8) share(.6)) ] focusses on the probability of publishing new papers depending on initial h index values. The option [ correlation(.8) ] means that agents with high initial h index values are more productive than agents with low initial h index values: the correlation between the probability of publishing new papers and initial h index values has been set to .8. The option [ share(.6) ] means that 60% of the agents publish. The use of this option can be reasoned, for instance, by the so called “sacred spark” theory (Cole & Cole, 1973) which claims “that there are substantial, predetermined differences among scientists in their ability and motivation to do creative scientific research” (Allison & Stewart, 1974, p. 596). The third agent-based simulation is intended to check whether the higher productivity of prolific agents has an effect on the h α index values development of the groups with high and low initial h index values. The equivalent function call to produce the simulated data in R is: simulate_hindex(runs = 50, n = 200, periods = 20, coauthors = 3, distr_initial_papers = 'poisson', dpapers_pois_lambda = 10, distr_citations = 'poisson', dcitations_mean = 5, dcitations_peak = 3, alpha_share = .33, diligence_corr = .8, diligence_share = .6)

18 Figure 3. Results of the third agent-based simulation: mean h α index values for two groups of agents with low (<7) and high (>7) initial h index values Final dataset : The results of the third simulation are presented in Figure 3. Whereas this third simulation considers a positive correlation between productivity and h index values, the second simulation includes the positive relationship between citations and h index values (see Figure 2). The findings in Figure 3 vary significantly from the results in Figure 1 and Figure 2. As the green line for the differences between the mean h α index values reveals, the differences increase stronger over time than in Figure 2. Thus, the results of both simulations point out that an effect on the h α index values can be especially expected when additional output is included, while the effect is less pronounced if additional impact is modeled instead. Fourth simulation considering strategically selecting co-authors

For the fourth simulation, we used the Stata command h_index, r(50) n(200) per(20) co(3) dp(poisson, mean(10)) dc(poisson, mean(5)) p(3) sh(.33) clear st . Compared to the baseline simulation, we additionally considered a strategic element [ st ], which focuses on the possible tendency of agents to select other agents as co-authors with lower h index values. Such a strategical element (with another focus) has been mentioned by Hirsch (2019b): “lower h-index authors have the choice to not collaborate with high h-index authors but rather pursue their own independent work, or work with more junior collaborators” (p. 1168). The strategic option of the h_index command means that a single agent with an h index – as high as possible – is assigned to every group of collaborating agents. Then, all other agents in the simulation are randomly allocated to the collaborating groups. Thus, the strategic option sizes the idea of collaborating with lower h index agents. The strategic option gives much weight to the effect of strategic collaboration decisions in our simulations, since the agents with the highest h index values never collaborate, so that their h α index values increase after every collaboration. Even though this specification puts a lot of emphasis on the strategic component in the collaboration process, the results of this simulation reveal the potential effect of strategic collaboration decisions on the outcome distribution. The strategic option follows closely the Coleman’s (1990) classic macro-micro-macro model (i.e., “Coleman’s boat”). “The general thrust of this model is that proper explanations of macro-level change and variation entail showing how macro-states at one point in time influence the behavior of individual actors, and how these actions add up to new macro-states at a later time“ (Hedström & Swedberg, 1996, p. 296). The model assumes that individual action results from the social context in a social network. Coleman’s model for the fourth agent-based simulation (see Figure 4) starts with the possible influence of a social context on the attitudes of agents (A). The current situation in science is characterized by performance The equivalent function call to produce the simulated data in R is: simulate_hindex(runs = 50, n = 200, periods = 20, coauthors = 3, distr_initial_papers = 'poisson', dpapers_pois_lambda = 10, distr_citations = 'poisson', dcitations_mean = 5, dcitations_peak = 3, alpha_share = .33, strategic_teams = TRUE)

20 based evaluations: “Especially in universities, government funding of scientific research is increasingly based upon performance criteria. As research institutions operate more and more in a global market, international comparisons of institutions are published on a regular basis” (Moed, 2018). This situation puts pressure on agents doing science in the system. The second (B) and third (C) steps are characterized by the core components of Hedström’s (2005) DBO theory (desires, beliefs, and opportunities). (B) The second step in the macro-micro-macro model is that the social context (here: increasing focus on performance criteria) influences the attitudes of single agents: The agents believe (given the pressure in the system) that they should increase their h α index values. As acting agents in the system they desire to perform as good as possible in terms of bibliometric indicators. (C) The agents have several opportunities to act: they can collaborate with other agents without considering their h index values or they can consider that in their reflections (among other alternatives). Since the h α index can only be improved when agents publish papers with co-authors having lower h index values, the strategic option simulates this possible tendency of agents. (D) The empirical analyses of the development of h α index values for agents with low and high initial h index values across several periods of action reveal how single actions of agents lead to the social phenomenon on the macro level: the reinforcement of the Matthew effect. 21 Figure 4. Coleman’s (1990) macro-micro-macro model depicting the relationship between performance-based funding and reinforcement of the Matthew effect The result of the fourth agent-based simulation is shown in Figure 5. It is clearly visible that the strategic element significantly reinforces the Matthew effect which is already visible in the previous simulations: agents with low h index values not only have lower initial h α index values than agents with high h index values, the h α index values also increase on a significantly lower level across the periods of evaluation. Across the periods of actions, the h α index value differences between both h index groups become larger and larger. 22 Figure 5. Results of the fourth simulation: mean h α index values for two groups of agents with low (<7) and high (>7) initial h index values Discussion

The agent-based simulations presented in this paper follow a recent discussion in

Scientometrics about the newly introduced h α index by Hirsch (2019a). Leydesdorff et al. (2019) assumed that the use of the new index reinforces the Matthew effect in research evaluations. Scientists with initial high h index values will profit disproportionally from the use of the h α index. Thus, the fear is that the use of the index enlarges a problem, which is already prevalent in the science system. According to Merton (1968) the problem of the Matthew effect in science is so great that “we are tempted to turn again to the Scriptures to designate the status-enhancement and status-suppression components of the Matthew effect. We can describe it as ‘the Ecclesiasticus component’, from the familiar injunction ‘Let us now praise famous men’, in the non-canonical book of that name” (p. 58). 23 Leydesdorff et al. (2019) even assumed – based on the definition of the h α index– that the disproportional attribution of credit by the h α index – the co-author with the highest h index receives the full credit – reflects the operation of the Matthew effect. Thus, the h α index is already the Matthew effect in operation. One cannot assume that the co-author with the highest h index contributes so much to the paper that the other co-authors can be completely discarded in performance measurement. In this study, we abstained from the single case and tested with various simulations whether the Matthew effect is visible on the macro level – when certain reasonable initial parameters and parameters for acting are set. The results of our study confirm what we expected from the single case: the h α index reinforces the Matthew effect. This effect can be intensified if strategic behavior of the publishing scientists and accumulative advantage effects are additionally considered in the simulation. Our study is situated in the tradition of the analytical sociology which seeks for mechanism-based explanations. These explanations try to focus on the crucial elements of a given process and to abstain from the detailed view (Hedström & Ylikoski, 2010). Agent-based modeling is a way of connecting the individual with the social level (Hedström, 2005). For studying a certain phenomenon on the macro level, the environment is defined in which the action takes place. Then, the action is running following certain pre-defined rules. It follows a dataset which constitutes the result of actions and initial parameters. This dataset can be used to investigate whether the social phenomenon of interest is observable on the macro level. By varying certain parameters of an agent-based model used as baseline, the effect of various situational elements from publishing, being cited, and collaborating on the development of the distribution of h α index values can be tested. This study is not only rooted in the analytical sociology, but also in the BBHs program. The program demands that indicators are empirically studied whether they can be used in certain evaluation environments (and if so, how they can be used). The h_index command and hindex package which we introduced in this paper can be used to simulate the 24 use of the h index and h α index in certain pre-defined environments. Using different specifications of the command (package functions), the simulation can be adapted to the environment for investigating where the h α index is intended to be used. In this study, we used the Stata command to test whether the Matthew effect becomes apparent when the h α index is calculated for a group of agents who collaborate, publish, receive citations across several periods. The command could be applied, for instance, by deans of faculties to decide whether the h α index should be used for research evaluation or not. Is there the danger that the Matthew effect becomes apparent? Introducing some parameters (data) in the model which the dean investigates beforehand (how many scientists are in the faculty, how is the mean and distribution of output, etc.), the dean can study whether the Matthew effect emerges or not. The R package and Stata command allow to consider some strategic elements in the agent-based simulations: if the h α index is used in research evaluation processes, scientists might try to cooperate strategically with co-authors having lower h index values. The findings of our simulations reveal that the consideration of this element leads to a significant reinforcement of the Matthew effect. Using different options of the h_index command or different parameters for the hindex package functions, the agent-based simulations can not only consider strategic behavior, but also information from the literature on the usual behavior of scientists and distributions of publications and citations in different fields and institutions (e.g., Perianes-Rodrigueza & Ruiz-Castillo, 2014). For example, we considered in our agent-based simulations that agents with higher h index values will publish more frequently than agents with lower h index values. Many studies have shown that future performance depends on previous performance (Abramo, D’Angelo, & Soldatenkova, 2017; Allison, Long, & Krauze, 1982; Kwiek, 2015). We also included another element in the simulations which can be derived from the literature: that authors might tend to cite highly-cited papers. Since the R package and Stata command are freely available, we encourage their use. We plan to add further functionality to them in the near future. 25 Acknowledgement

We thank Jorge Hirsch for encouraging discussions and very useful comments to a preliminary version of this manuscript. 26

References

Abbasi, A., Hossain, L., & Leydesdorff, L. (2012). Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks.

Journal of Informetrics, 6 (3), 403-412. doi: 10.1016/j.joi.2012.01.002. Abramo, G., D’Angelo, C. A., & Soldatenkova, A. (2017). How long do top scientists maintain their stardom? An analysis by region, gender and discipline: Evidence from italy.

Scientometrics, 110 (2), 867-877. doi: 10.1007/s11192-016-2193-x. Ahrweiler, P. (2001).

Informationstechnik und kommukiationsmanagement: Netzwerksimulation fuer die wissenschafts- und technikforschung

Journal of Product Innovation Management, 28 (2), 218-235. doi: 10.1111/j.1540-5885.2010.00793.x. Allison, P. D., Long, J. S., & Krauze, T. K. (1982). Cumulative advantage and inequality in science.

American Sociological Review, 47 (5), 615-625. Allison, P. D., & Stewart, J. A. (1974). Productivity differences among scientists - evidence for accumulative advantage.

American Sociological Review, 39 (4), 596-606. doi: Doi 10.2307/2094424. Backs, S., Günther, M., & Stummer, C. (2019). Stimulating academic patenting in a university ecosystem: An agent-based simulation approach.

The Journal of Technology Transfer, 44 (2), 434-461. doi: 10.1007/s10961-018-9697-x. Barabási, A. L. (2002).

Linked: The new science of networks . Cambridge, MA, USA: Perseus Publishing. Barabási, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations.

Physica a-Statistical Mechanics and Its Applications, 311 (3-4), 590-614. doi: 10.1016/S0378-4371(02)00736-7. Bonitz, M., Bruckner, E., & Scharnhorst, A. (1999). The matthew index - concentration patterns and matthew core journals.

Scientometrics, 44 (3), 361-378. doi: 10.1007/Bf02458485. Bornmann, L. (2010). Die analytische soziologie: Soziale mechanismen, dbo-theorie und agentenbasierte modelle.

Österreichische Zeitschrift für Soziologie, 35 (4), 25-44. doi: 10.1007/s11614-010-0076-6. Bornmann, L., Butz, A., & Wohlrabe, K. (2018). What are the top five journals in economics? A new meta-ranking.

Applied Economics, 50 (6), 659-675. doi: 10.1080/00036846.2017.1332753. Bornmann, L., & Marewski, J. N. (in press). Heuristics as conceptual lens for understanding and studying the usage of bibliometrics in research evaluation.

Scientometrics . Bornmann, L., Mutz, R., Hug, S., & Daniel, H. (2011). A multilevel meta-analysis of studies reporting correlations between the h index and 37 different h index variants.

Journal of Informetrics, 5 (3), 346-359. doi: 10.1016/j.joi.2011.01.006. Campbell, D. T. (1991). Autopoietic evolutionary epistemology and internal selection.

Journal of Social and Biological Structures, 14 (2), 166-173. doi: 10.1016/0140-1750(91)90137-F. Cole, J. R., & Cole, S. (1973).

Social stratification in science . Chicago, MA, USA: The University of Chicago Press. 27 Coleman, J. S. (1990).

Foundations of social theory . Cambridge, MA, USA: Belknap Press of Harvard University Press. de Solla Price, D. (1976). A general theory of bibliometric and other cumulative advantage processes.

Journal of the American Society for Information Science, 27 (5-6), 292-306. Distin, K. (2010).

Cultural evolution . New York, NY, USA: Cambridge University Press. Edmonds, B., Gilbert, N., Ahrweiler, P., & Scharnhorst, A. (2011). Simulating the social processes of science.

Journal of Artificial Societies and Social Simulation, 14 (4). doi: DOI 10.18564/jasss.1842. Edmonds, B., Hernandez, C., & Troitzsch, K. G. (2007).

Social simulation: Technologies, advances and new discoveries . Hershey, New York: Information Science Reference. Egghe, L. (2006a). How to improve the h-index?

The Scientist, 20 (3), 14. Egghe, L. (2006b). Theory and practise of the g -index. Scientometrics, 69 (1), 131-152. doi: 10.1007/s11192-006-0144-7. Frandsen, T. F., & Nicolaisen, J. (2017). Citation behavior: A large-scale test of the persuasion by name-dropping hypothesis.

Journal of the Association for Information Science and Technology, 68 (5), 1278-1284. doi: 10.1002/asi.23746. Garavaglia, A., van der Hofstad, R., & Woeginger, G. (2017). The dynamics of power laws: Fitness and aging in preferential attachment trees.

Journal of Statistical Physics, 168 (6), 1137-1179. doi: 10.1007/s10955-017-1841-8. Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making.

Annual Review of Psychology, 62 , 451-482. Gigerenzer, G., Todd, P. M., & ABC Research Group. (1999).

Simple heuristics that make us smart . Oxford, UK: Oxford University Press. Gilbert, N. (1997). A simulation of the structure of academic science.

Sociological Research Online, 2 (2). Glänzel, W., & Schöpflin, U. (1995). A bibliometric study on aging and reception processes of scientific literature.

Journal of Information Science, 21 (1), 37-53. doi: 10.1177/016555159502100104. Guns, R., & Rousseau, R. (2009). Simulating growth of the h-index.

Journal of the American Society for Information Science and Technology, 60 (2), 410-417. doi: 10.1002/asi.20973. Hedström, P. (2005).

Dissecting the social: On the principles of analytical sociology . Cambridge, UK: Cambridge University Press. Hedström, P., & Swedberg, R. (1996). Social mechanisms.

Acta Sociologica, 39 (3), 281-308. Hedström, P., & Ylikoski, P. (2010). Causal mechanisms in the social sciences.

Annual Review of Sociology, Vol 36, 36 , 49-67. doi: 10.1146/annurev.soc.012809.102632. Hirsch, J. E. (2005). An index to quantify an individual's scientific research output.

Proceedings of the National Academy of Sciences of the United States of America, 102 (46), 16569-16572. doi: 10.1073/pnas.0507655102. Hirsch, J. E. (2019a). Hα: An index to quantify an individual's scientific leadership.

Scientometrics, 118 (2), 673–686. Hirsch, J. E. (2019b). Response to comment “hα: The scientist as chimpanzee or bonobo”, by leydesdorff, bornmann and opthof.

Scientometrics, 118 (3), 1167–1172. doi: 10.1007/s11192-019-03019-w. Hodgson, G., & Knudsen, T. (2011).

Darwin’s conjecture: The search for general principles of social and economic evolution . Chicago/London: University of Chicago Press. Ionescu, G., & Chopard, B. (2013). An agent-based model for the bibliometric h-index.

The European Physical Journal B, 86 (10), 426. doi: 10.1140/epjb/e2013-40207-0. Ivanova, I. A., & Leydesdorff, L. (2014). A simulation model of the triple helix of university–industry–government relations and the decomposition of the redundancy.

Scientometrics, 99 (3), 927-948. doi: 10.1007/s11192-014-1241-7. 28 Jann, B. (2005).

Moremata: Stata module (mata) to provide various functions, statistical software components s455001, boston college department of economics, revised 17 may 2019 . Kwiek, M. (2015). The european research elite: A cross-national study of highly productive academics in 11 countries.

Higher Education, 71 (3), 379–397. doi: 10.1007/s10734-015-9910-x. Leydesdorff, L. (2001). Technology and culture: The dissemination and the potential 'lock-in' of new technologies.

Journal of Artificial Societies and Social Simulation, 4 (3), U20-U44. Leydesdorff, L. (2015). Can intellectual processes in the sciences also be simulated? The anticipation and visualization of possible future states.

Scientometrics, 105 (3), 2197-2214. doi: 10.1007/s11192-015-1630-6. Leydesdorff, L., Bornmann, L., & Opthof, T. (2019). Hα: The scientist as chimpanzee or bonobo.

Scientometrics, 118 (3), 1163–1166. Leydesdorff, L., & den Besselaar, P. V. (1998). Technological developments and factor substitution in a complex and dynamic system.

Journal of Social and Evolutionary Systems, 21 (2), 173-192. doi: 10.1016/S1061-7361(00)80004-1. Leydesdorff, L., & van den Besselaar, P. (1998). Competing technologies: Lock-ins and lock-outs. In D. M. Dubois (Ed.),

Computing anticipatory systems: Casys'97 (pp. 309-323). New York, NY, USA: Woodbury. Leydesdorff, L., & van den Besselaar, P. (Eds.). (1994).

Evolutionary economics and chaos theory: New directions in technology studies

Journal of Legal and Political Sociology, 1 , 115-126. Merton, R. K. (1968). The matthew effect in science.

Science, 159 (3810), 56-63. Merton, R. K. (1973).

The sociology of science: Theoretical and empirical investigations . Chicago, IL, USA: University of Chicago Press. Meyer, M., Lorscheid, I., & Troitzsch, K. G. (2009). The development of social simulation as reflected in the first ten years of jasss: A citation and co-citation analysis.

Journal of Artificial Societies and Social Simulation, 12 (4), A224-A243. Moed, H. F. (2018). Assessment and support of emerging research groups.

FEMS Microbiology Letters, 365 (17), fny189-fny189. doi: 10.1093/femsle/fny189. Newman, M. E. J. (2001). Clustering and preferential attachment in growing networks.

Physical Review E, 64 (2). doi: 10.1103/PhysRevE.64.025102. Perianes-Rodrigueza, A., & Ruiz-Castillo, J. (2014). Within- and between-department variability in individual productivity. The case of economics. In P. Wouters (Ed.),

Proceedings of the science and technology indicators conference 2014 leiden “context counts: Pathways to master big and little data” (pp. 423-430). Leiden, the Netherlands: University of Leiden. Petersen, A. M., Fortunato, S., Pan, R. K., Kaski, K., Penner, O., Rungi, A., . . . Pammolli, F. (2014). Reputation and impact in academic careers.

Proceedings of the National Academy of Sciences, 111 (43), 15316-15321. doi: 10.1073/pnas.1323111111. Popper, K. R. (1972).

Objective knowledge: An evolutionary approach . Oxford, New York: Clarendon Press. Scharnhorst, A., Börner, K., & van den Besselaar, P. (Eds.). (2012).

Models of science dynamics: Encounters between complexity theory and information sciences . Heidelberg, Germany: Springer. Simon, H. A. (1957).

Models of man, social and rational . New York, NY, USA: Wiley. 29 Tesfatsion, L. (2002). Agent-based computational economics: Growing economies from the bottom up.

Artificial Life, 8 (1), 55-82. doi: 10.1162/106454602753694765. Tietze, A., Galam, S., & Hofmann, P. (2019). Crediting multi-authored papers to single authors. Retrieved May 13, 2019, from https://arxiv.org/pdf/1905.01943.pdf Waltman, L., & van Eck, N. J. (2012). The inconsistency of the h-index.

Journal of the American Society for Information Science and Technology, 63 (2), 406-415. doi: 10.1002/asi.21678. You, Z.-Q., Han, X.-P., & Hadzibeganovic, T. (2016). The role of research efficiency in the evolution of scientific productivity and impact: An agent-based model.

Physics Letters A, 380 (7-8), 828-836. doi: 10.1016/j.physleta.2015.12.022. Żogała-Siudem, B., Siudem, G., Cena, A., & Gagolewski, M. (2016). Agent-based model for the h-index – exact solution.