[PDF] Citation gaming induced by bibliometric evaluation: a country-level comparative analysis

Abstract

It is several years since national research evaluation systems around the globe started making use of quantitative indicators to measure the performance of researchers. Nevertheless, the effects on these systems on the behavior of the evaluated researchers are still largely unknown. We attempt to shed light on this topic by investigating how Italian researchers reacted to the introduction in 2011 of national regulations in which key passages of professional careers are governed by bibliometric indicators. A new inwardness measure, able to gauge the degree of scientific self-referentiality of a country, is defined as the proportion of citations coming from the country itself compared to the total number of citations gathered by the country. Compared to the trends of the other G10 countries in the period 2000-2016, Italy's inwardness shows a net increase after the introduction of the new evaluation rules. Indeed, globally and also for a large majority of the research fields, Italy became the European country with the highest inwardness. Possible explanations are proposed and discussed, concluding that the observed trends are strongly suggestive of a generalized strategic use of citations, both in the form of author self-citations and of citation clubs. We argue that the Italian case offers crucial insights on the constitutive effects of evaluation systems. As such, it could become a paradigmatic case in the debate about the use of indicators in science-policy contexts.

Full PDF

aa r X i v : . [ c s . D L ] A p r Citation gaming induced by bibliometric evaluation: acountry-level comparative analysis ∗ Alberto Baccini † Giuseppe De Nicolao ‡ Eugenio Petrovich § Abstract

It is several years since national research evaluation systems around the globe startedmaking use of quantitative indicators to measure the performance of researchers. Neverthe-less, the eﬀects on these systems on the behavior of the evaluated researchers are still largelyunknown. We attempt to shed light on this topic by investigating how Italian researchersreacted to the introduction in 2011 of national regulations in which key passages of profes-sional careers are governed by bibliometric indicators. A new inwardness measure, able togauge the degree of scientiﬁc self-referentiality of a country, is deﬁned as the proportion ofcitations coming from the country itself compared to the total number of citations gatheredby the country. Compared to the trends of the other G10 countries in the period 2000-2016, Italy’s inwardness shows a net increase after the introduction of the new evaluationrules. Indeed, globally and also for a large majority of the research ﬁelds, Italy became theEuropean country with the highest inwardness. Possible explanations are proposed and dis-cussed, concluding that the observed trends are strongly suggestive of a generalized strategicuse of citations, both in the form of author self-citations and of citation clubs. We arguethat the Italian case oﬀers crucial insights on the constitutive eﬀects of evaluation systems.As such, it could become a paradigmatic case in the debate about the use of indicators inscience-policy contexts.

Keywords:

Bibliometric Indicators, Performance-based Research Evalua-tion Systems, Inwardness, Research Policy, Academic Careers, Italy, G10. ∗ Funding: This work was supported by Institute For New Economic Thinking Grant ID INO17-00015. † Department of Economics and Statistics, University of Siena, Italy; [email protected] ‡ Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy § Department of Economics and Statistics, University of Siena, Italy Introduction

Starting from the late 1980s, several European and extra-European countries implementednational systems to monitor, assess, and evaluate the research performance of their scientiﬁcworkforce (1, 2). One of the key features of such research evaluation systems is the focuson quantitative indicators (metrics) as crucial science policy tools (3). Accordingly, inthe last years, several scientometric indicators, based on publications or citations (or ona combination of both, such as the h-index ), have increasingly appeared in the academicevaluation systems, alongside with the traditional peer-review-based procedures.The use of these indicators in the evaluation of research performance has generated aheated debate in the scientiﬁc community. The advocates argue that scientometric measuresare not only more objective than the peer-review (4); they would also improve both thequantity and the quality of the scientiﬁc production (5, 6). This would occur becausethe indicators are integrated within a system of incentives that rewards the achievementof the scientometric targets set by the evaluation system (7). On the other hand, criticsclaim that the same mechanisms that are designed to improve the research performancecreate at the same time room for strategic behaviors (8). For instance, when productivityis positively rewarded, the number of publications become a goal that can be pursued notonly by positive behaviors (doing more research), but also by opportunistic strategies (e.g.,slicing one scientiﬁc work into multiple publications) (9, 10). Analogously, when citationsbecome a goal, the «citation game» starts (11). A mediating position is represented byscholars proposing a «responsible use» of metrics. According to this approach, researchmetrics can provide valuable insights on the research performance, granted that they arecarefully designed in order to avoid unintended consequences. Thus, a distillation of bestpractices has been proposed for improving the use of metrics in research assessment (12).Recently, the idea that the consequences of the use of indicators on the behavior ofresearchers can be easily sorted between the intended and the unintended ones, has beenquestioned as too simplistic (13, 14). Instead, the notion of «constitutive eﬀects» has beenadvanced to capture the way in which the indicators act on the researchers (15). Withinthis new framework, indicators are conceived as shaping the activity of research deeply andat diﬀerent levels, from the citation habits to the research agenda, redeﬁning at the sametime key evaluative terms such as research quality (16). They become crucial actors in the«epistemic living spaces» of academic researchers (17) and researchers begin to «think withindicators» pervasively (18).The main constitutive eﬀects of the indicators described in the literature can be groupedinto three main types: i) Goal-displacement: scoring high on the indicators becomes a targetin itself, that is to be achieved also by gaming the system (19, 20); ii) Risk avoidance: highlyinnovative, not mainstream, and interdisciplinary research topics are avoided because theycould do not score well on indicators that tend to reward more traditional research pro-grammes (18, 21–25); iii) Task reduction: when academic activities such as teaching andpublic engagement are not rewarded, academics tend to avoid them to concentrate only onpublishable academic research (26–28).Although these eﬀects have been highly debated, until recently the evidence of their occur-rence has been mainly anecdotal. It is only in the last years that the methodical empiricalstudy of such eﬀects has been undertaken (13, 21). In the present paper, we aim to advancethe knowledge on this topic by focusing on the case of Italy. Among European and extra- uropean countries, Italy is the only one in which some key career passages of scientiﬁcresearchers are entirely regulated by rules based on bibliometric indicators. Thus, Italyis ideally suited to studying the response of researchers to the use of metrics in researchevaluation.In particular, we will investigate whether Italian scientists have pervasively adopted astrategic use of citations in order to boost their indicators. By “pervasively”, we mean thatthe eﬀect of this behavior should be visible in the great majority of scientiﬁc ﬁelds, at thenational level . As we will highlight in the Conclusion, the Italian case provides importantinsights on the constitutive eﬀects of evaluation systems in general.The rest of the paper is organized as follows. In the next two sections, the speciﬁcityof the Italian case is explained and the literature dealing with self-citing strategic behaviorsis reviewed. Next, a new “inwardness” indicator is introduced that is sensitive to collectivestrategic citation behaviors at a country level. In the Data section, the procedure for retriev-ing the data is described, while the main ﬁndings are presented in the Results section. In theDiscussion, after examining alternative explanations, it is argued in favor of the emergenceof a collective strategic behavior devised to meet the demands of the evaluation system. Inthe Conclusion, some general lessons from the Italian case are drawn. In 2010, the Italian university system underwent a wide process of reformation, regulatedby the Law 240/2010. The reform created the Agency for the Evaluation of the Universityand Research (ANVUR), a centralized agency whose main task is the monitoring and theevaluation of the Italian research system. The Agency started in 2011 a research assessmentexercise called VQR, relative to the period 2004-2010. A second research assessment exercisewas started in 2015, relative to the period 2011-2014. In both exercises, the evaluation ofsubmitted articles was largely based on the automatic or semi-automatic use of algorithmsfed by citation indicators (29) while other research outputs, such as books, were evaluatedby peer reviews.The reform modiﬁed also the recruitment and advancement system for university profes-sors by introducing the National Scientiﬁc Habilitation (ASN). Both for hiring and promo-tion, having obtained the ASN has become mandatory for applying to academic positions.The bibliometric rules rely on three indicators. For the hard sciences, life sciences, andengineering, the indicators considered by ANVUR are the number of journal articles, thenumber of citations, and the h-index. For the social science and humanities, the indicatorsare the number of research outputs, the number of monographs, and the number of paperspublished in “class A” journals. At each new round of habilitation, ANVUR calculates foreach of these indicators the “bibliometric thresholds” that the candidates must overcome toachieve the ASN. Candidates whose indicators do not overcome two thresholds out of threecannot be habilitated (exceptions were possible in speciﬁc circumstances only in the ﬁrstedition, ASN 2012). When ﬁrst introduced, the thresholds were stated to be the median Except for the scholars in the Social Sciences and Humanities (see next section). alues of the indicators of the permanent academic staﬀ holding that position (associateor full professor). To make and example, in order to obtain a full professor habilitation,the candidate was required to score better than half of the current full professors in twoindicators out of three. Applicants overcoming the ﬁxed thresholds are then evaluated by acommittee composed by ﬁve referees who are in charge of the ﬁnal decision about attributinghabilitation.Note that the focus on indicators is not conﬁned to the national procedures but “tricklesdown” to the university committees in charge of recruiting and promotion that are requiredto take into account production and citation metrics when they evaluate and rank thehabilitated applicants. Finally, also the members of both the national habilitation and thelocal recruitment committees are required to overcome bibliometric thresholds.In sum, in Italy, starting from 2011, bibliometric indicators have gained a central role notonly in the national research assessment but in the entire body of the recruitment procedures.A remarkable peculiarity of the Italian system is that the indicators based on citations, usedboth in the habilitation procedure and in the research evaluation exercise, are calculated by including self-citations . Thus, researchers can increase their indicators just by self-citingtheir own work.Anecdotal evidence of the adoption of strategic behaviors in the form of author self-citations has been presented by Baccini (30). Two recent studies have documented morethoroughly the rise of opportunistic behaviors in response to the ASN rules. Seeber etal. has analyzed how the use of self-citations in four Italian research areas changed afterthe introduction of the habilitation procedure. They have found that scientists in need ofmeeting the thresholds (i.e., those looking for habilitation as a prerequisite for tenure-trackor promotion to full professor) did increase signiﬁcantly their self-citations after 2010 (31).Scarpa et al. focused on the Italian engineering area and found an anomalous peak inthe self-citations rate (i.e., the number of self-citations to the total number of citations) incorrespondence of the second round of the habilitation procedure, in 2013. (32). Even if the afore-mentioned studies have highlighted some recent behavior changes of Italianscientists, they did not address a subtler form of strategic behavior, the one based on theso-called «citation clubs» or «citation cartels». A citation club is an informal structurein which citations are strategically exchanged among its members to boost the respectivecitation scores (33–35). Note that this kind of strategy cannot be spotted when we usethe standard deﬁnition of self-citation, according to which a self-citation occurs wheneverthe set of co-authors of the citing papers and that of the cited one are not disjoint (36,37), because the members of the citation club might not be also co-authors. In order toallow for the eﬀects of citation clubs, we examine a particular – and not much studied –form of self-citations, namely the country self-citations (38). A country self-citation occurswhenever the set of the countries of the authors of the citing publication and the set of thecountries of the authors of the cited publication are not disjoint, that is, if these two setsshare at least one country (39, 40). Notably, any citation exchanged within a citation clubformed by researchers working in the same country is counted as country self-citations, evenwhen it is not an author self-citation.Thus, considering that most of the standard author self-citations are country self-citations oo, by analyzing the country self-citations, we can capture both the “classic” strategy basedon author self-citations, and the “elaborated” one based on citation clubs. As far as we areinterested in countries and not in the individual authors, we will say, by short, that a paperis “authored by a country” when at least one of its authors is from that country.Just as not all author self-citations originate from gaming purposes, in the same way notall country self-citations are the result of opportunistic behaviors. Indeed, the literature onauthor self-citations agrees on the fact that a certain amount of them is a normal byproductof the scientiﬁc communication. There are many perfectly legitimate reasons for citing one’sown works, such as building on previously obtained results, avoiding repetition, and so on(41–43). By the same token, it is normal that a country has an internal exchange of citationsamongst its researchers insofar the knowledge produced by the country is used (i.e., cited)by the same country’s scientiﬁc staﬀ.Moreover, international collaboration positively aﬀects the number of country self-citations.In fact, the more a country collaborates with other countries, the higher will be the numberof country self-citations. Take for instance a paper authored in collaboration by Italy andFrance. Any future citation to that paper coming from an Italian-authored or a French-authored publication will count as a country self-citation for both Italy and France, sincethe citing and the cited publication will share at least one country of aﬃliation.In sum, the country self-citations are not per se a sign of strategic behavior since theydepend both on the internal exchange of knowledge within a country and the amount of in-ternational collaboration. Nonetheless, if the researchers of a single country initiate strategicbehaviors in order to boost their citations, this is likely to produce an anomalous increase of country self-citations compared to the other countries. In order to obtain a normalized measure of country self-citations, we introduce a simpleindicator of “inwardness”. For a given year and a country c , the inwardness is deﬁned asthe percentage ratio between the total number of country self-citations ( S c ) and the totalnumber of citations ( C c ) of that country : I c = S c C c × (1)The minimum value of the inwardness indicator is I c = 0 when a country has no self-citations; and the maximum is I c = 100 when a country has self-citations only, that is S c = C c .It is easy to show that the inwardness indicator is a variant of the Relative Ci-tation Impact ( RCI ) of a country. The

RCI is deﬁned by May (44) as the ratio betweenthe average citation per paper of a country and the average citation per paper of the world(see also (45)). The

RCI of the country c in a given year is deﬁned as RCI c = C c P c × P w C w where C c and C w are the total number of citations of the country and of the world, and P c and P w the publications of the country and of the world. The total number of citations isthe sum of the country self-citations ( S c ) and the external citation ( X c ); when the world isconsidered C w = S w , since obviously X w = 0 . If a Relative Self-citation Impact is deﬁnedas RSI c = S c P c × P w S w , the inwardness indicator can be expressed as The only exception being authors that changed their country between the citing and the cited publication. c = RSI c RCI c =  S c P c S w P w  ×  C w P w C c P c  = S c C c (2)Note that the inwardness indicator is normalized for the size of the country in terms ofpublications. From a conceptual point of view, the inwardness of a country is an indicatorof how much the knowledge produced in the form of scientiﬁc publications in a given yearin a country ﬂows, through citations, into the knowledge produced in that country in thefollowing years (46–48). Indeed, − I c indicates how much of the knowledge produced ina year in a country ﬂows, through citations, into the knowledge (publications) produced byother countries (49, 50). A higher level of inwardness suggests that the knowledge producedby a country attracts mainly the interest of the national community. By contrast, a lowerlevel suggests that the research of the country does not remain conﬁned within its ownborders but ﬂows also toward the rest of the world.As said above, the strategic use of citations, both as author self-citations and as citationclubs, aﬀects the country self-citations and, hence, also the inwardness indicator. The startof a strategic use of citations at the country level should therefore be associated with an anomalous rise of the inwardness indicator.Recall, however, that inwardness is positively aﬀected also by increases of internationalcollaboration. It is therefore necessary to control the trend of the international collaborationbefore concluding that an inwardness rise is due to strategic behaviors and not to an increaseof international collaboration. In particular,we exported from SCIval two metrics: (1) Citation Count including self-citations, and (2)Citation Count excluding self-citations. For both metrics, we included articles, reviews,and conference papers, leaving aside other types of publications. The ﬁrst Citation Countmetrics represents the countries’ total number of citations, whereas the countries’ numberof self-citations was obtained as the diﬀerence between (1) and (2). We retrieved the data for the G10 countries (Belgium-BE, Canada-CA, France-FR,Germany-DE, Italy-IT, Japan-JP, the Netherlands-NL, Sweden-SE, Switzerland-CH, UnitedKingdom-GB, United States-US). In order to study the spread of the strategic behavior indiﬀerent research areas, data were exported for all the Scopus ﬁelds aggregated, i.e., withoutany ﬁlter for subject area, and for each of the 27 Scopus Main Categories (total numberof datasets = 28), for the years 2000-2016 included. In order to account for the eﬀect ofinternational collaboration on the inwardness indicators, we retrieved from SCIval also thePercentage of International Collaboration metric for the target countries. The percentage of The data were exported from SCIval on October 16, 2018. They correspond to the last update on Scopus ofSeptember 21, 2018. Note that the SCIval’s deﬁnition is binary and non-fractional: a citation can either be a self-citation or not(51). The weight of a country self-citation remains always 1, irrespective of the number of countries producingthe citing or the cited publications: if an Italian publication is cited by another Italian publication, this self-citation will have the same weight as if the same publication was cited by an international Italo-French-Chinesepublication. ∆ (2000-2008) ∆ (2008-2016) ∆ tot (2000-2016)Belgium 1.42 3.29 4.72Canada 1.04 3.43 4.46France 1.57 2.68 4.25Germany 1.69 5.47 7.17Italy 1.82 8.29 10.11Japan 0.6 3.2 3.81Netherlands 2 3.54 5.54Sweden 0.94 3.32 4.27Switzerland 0.94 3.32 4.27United Kingdom 1.45 4.4 5.85United States 0.14 2.87 3.01 Mean G10 countries international collaboration for a country in a given year is deﬁned as the share of publicationsof the country coauthored by at least one diﬀerent country. The graphs were implementedin R by using the package “ggplot2” (52).

Figure 1 shows the trend of the inwardness over time for the eleven target countries (allScopus ﬁelds aggregated). All countries share a rather similar proﬁle with apparent diﬀer-ences in the absolute value. The ranking is partially explained by the size of the scientiﬁcproduction of the countries. Countries with a large scientiﬁc output, such as the UnitesStates, naturally attract more citations from their own production, simply because theyhave more citing and citable articles than smaller countries such as Belgium. For all thecountries under analysis, not only the inwardness increases slowly and regularly over time,but the yearly ranks of countries according to their inwardness are remarkably stable.In this landscape, Italy stands out as a notable exception. In 2000, at the beginning of theperiod, Italy has an inwardness of 20.62% and ranks sixth, just behind UK. In 2016, at theend of the period, Italy ranks second, with an inwardness of 30.73%. Note that, until 2009,Italy’s inwardness grows parallel to those of comparable countries (UK, Germany, France).However, around 2010, the Italian trend shows a sudden acceleration. In the following sixyears, Italy overcomes UK, Germany, and Japan, becoming the ﬁrst European country andthe second one in the G10 group.Table 1 shows the variations (deltas) of the inwardness for each country, for the wholeperiod and by considering two sub-periods, 2008-2000 and 2016-2008. Note that in the ﬁrstperiod, Italy’s increase is in line with other countries, while in the second period (2008-2016), Italy’s exhibits the largest inwardness delta: 8.29 p.p., more than 4 p.p. above theG10 average and almost 3 p.p. above Germany. As a result, Italy is by far the country withthe highest inwardness delta also in the whole period 2000-2016 (10.11 p.p. vs 5.22 of theG10 average).However, as already said, inwardness is aﬀected by the amount of International Col-laboration of a country. In order to allow for this eﬀect, in Figure 2, inwardness is plotted HBENLSECAFRITGBDEJPUS

Year I n w a r dne ss Figure 1: Inwardness for G10 countries (2000-2016). Source: elaboration on SCIval data.8 gainst the average international collaboration score of each country. More precisely, in-wardness at year Y is plotted against the three-years moving average value of internationalcollaboration calculated starting from year Y. In fact inwardness at year Y depends also oncitations coming from publications appeared in the following years (53).The data shows indeed a positive relation between the two variables: for all the countries,inwardness grows with the average international collaboration. The plot shows a peculiartrajectory for Italy. Although for most years Italy ranks last in Europe for internationalcollaboration (x-axis), nevertheless, at the end of the period, it is the ﬁrst European countryfor inwardness (y-axis). Before 2010, Italy is close to and moves together with a group ofthree European countries, namely Germany, UK, and France. Starting from 2010, Italydeparts from the group along a steep trajectory, to eventually become the European country with the lowest international collaboration and the highest inwardness .Until now, we focused on the aggregated output of the target countries, without consid-ering the diﬀerent research areas (Scopus Main Categories). In order to investigate whetherand how inwardness changes across research areas, we calculated the inwardness time seriesfor each of the 27 Scopus Main Categories. The time series, as well as the scatterplots of theinwardness against the international collaboration, are fully provided in the SupplementaryMaterials. For reasons of space, these data are summarized in Figure 3, where the variationof the inwardness indicator in the periods 2000-2008 (A) and 2008-2016 (B) is displayedfor each of the 27 Scopus Categories. Italy shows a remarkable diﬀerence between the twoperiods. In the ﬁrst one (Figure 3A), before the university reform, Italy is in line with theother G10 countries in most of the research ﬁelds. In the second period, after the reform,Italy stands out with the highest inwardness increase in 23 out of 27 ﬁelds. The only ex-ceptions are earth and planetary sciences (EPS), multidisciplinary (MUL), nursing (NUR),and physics and astronomy (PA).As we show in the Supplementary Materials, the inwardness increase is not matchedby a parallel increase of the international collaboration at the ﬁeld level. In particular, atthe end of the period, Italy is the European country with the lowest level of internationalcollaboration and the highest value of inwardness in the following Scopus Categories (11 on27): agricultural and biological sciences (ABS), biochemistry, genetics and molecular biology(BGMB), chemical engineering (CE), economics, econometrics and ﬁnance (EEF), earthand planetary sciences (EPS), environmental science (ES), immunology and microbiology(IM), pharmacology, toxicology and pharmaceutics (PTP), veterinary (VET). In other 9Categories, Italy is ﬁrst for inwardness but not the lowest for international collaboration:business, management and accounting (BMA), computer science (CS), dentistry (DEN),decision sciences (DS), engineering (ENG), health professions (HP), mathematics (MAT),materials science (MS), psychology (PSY). Note that the Italian production in the arts andhumanities (AH) and social sciences (SOC) is only partially covered by Scopus as a largepart is published in books and in the national language. Therefore, the results about thesescholarly areas should be taken with great caution (54).

As seen from Figure 1 and Table 1 Italy shows a diﬀerent trend compared to the otherG10 countries. The most notable aspect is that, after 2009, Italy’s inwardness grows faster .The acceleration is about synchronous with the launch of the national assessment exer-

000 20162000 20162000 20162000 20162000 20162000 2016 2000 20162000 20162000 20162000 20162000 2016

BECA CHDE FRGBITJP NLSEUS

Average International Collaboration I n w a r dne ss Figure 2: Inwardness versus average international collaboration for the G10 countries. Theaverage international collaboration is the 3-year moving average calculated starting from theconsidered year. The international collaboration is deﬁned as the share of publications of acountry coauthored by at least a coauthor of a diﬀerent country. Source: elaboration fromSCIval data. 10

SY PTP SOC VETMED MS MUL NEU NUR PAENG EPS ES HP IM MATCHEM CS DEN DS E EEFA L L ABS AH BGMB BMA CE

BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS −0.050.000.050.100.15−0.050.000.050.100.15−0.050.000.050.100.15−0.050.000.050.100.15−0.050.000.050.100.15

A: Inwardness delta 2000−2008 per Scopus Category

PSY PTP SOC VETMED MS MUL NEU NUR PAENG EPS ES HP IM MATCHEM CS DEN DS E EEFA L L ABS AH BGMB BMA CE

BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS BECACHDEFRGBIT JPNLSEUS

B: Inwardness delta 2008−2016 per Scopus Category

Figure 3: Inwardness delta in Scopus Main Categories in the periods 2000-2008 (A) and 2008-2016 (B). ABS = Agricultural and Biological Sciences, AH = Arts and Humanities, BGMB =Biochemistry, Genetics and Molecular Biology, BMA = Business, Management and Accounting,CE = Chemical Engineering, CHEM = Chemistry, CS = Computer Science, DEN = Dentistry,DS = Decision Sciences, E = Energy, EEF = Economics, Econometrics and Finance, ENG =Engineering, EPS = Earth and Planetary Sciences, ES = Environmental Science, HP = HealthProfessions, IM = Immunology and Microbiology, MAT = Mathematics, MED= Medicine, MS= Materials Science, MUL = Multidisciplinary, NEU = Neuroscience, NUR = Nursing, PA =Physics and Astronomy, PSY = Psychology, PTP = Pharmacology, Toxicology and Pharmaceu-tics, SOC = Social Sciences, VET = Veterinary. Source: elaboration from SCIval data.11 ise in 2011 and the opening in 2012 of the ASN, the new scientiﬁc habilitation system,whose bibliometric criteria, largely relying on citations, had been announced in 2011. Alikely explanation of the anomalous trend is that the Italian scientiﬁc community reacted tothe bibliometric thresholds set by ANVUR by citing more frequently the Italian scientiﬁcproduction. More speciﬁcally, we argue that the change in the citation behavior is due tothe widely spread adoption, by Italian researchers, of strategies for boosting bibliometricindicators set by ANVUR. As said in the Introduction, such strategies include both theartiﬁcial increase of author self-citations and the creation of nationally-based citation clubs.The Italian anomalous trend is possibly the joint result of these two strategic answers tothe incentives of the evaluation system. The slight discrepancy between the starting yearof the inwardness acceleration and the launch of bibliometric evaluation system, with theformer occurring slightly earlier than the latter, is easily explained by the “backward eﬀect”typical of citation measures. Any change in the citation habits taking place in a given yearproduces a backward eﬀect on the citation scores of the previous years because researcherscite previously published papers, so that the change reverberates also on the citation scoresof the past production. The Italian ASN used time horizons of 10 and 15 years for countingcitations and for calculating h-indexes of applicants and referees. Citations received by themost recent articles have a more lasting eﬀect in the calculations of forthcoming indicators.It is therefore more convenient to self-cite one’s own recent production rather than the re-mote one. Hence, a strategic reaction to rules introduced in year 2011 is expected to producean inwardness acceleration that starts a few years before, just as observed for Italy.Two alternative explanations of the data could be advanced. The Italian accelerationmay be due to a sudden rise, after 2009, of the amount of international collaborations. Infact, we have already observed that, other things left unchanged, an increase of internationalcollaboration positively aﬀects the inwardness indicator. However, Figure 2 rules out thisalternative explanation. No peculiar increase in the Italian international collaboration can bespotted. A second alternative explanation argues that the inwardness acceleration may dueto the narrowing of the scientiﬁc focus of Italian researchers, i.e. to a dynamic of scientiﬁcspecialization which led to a growth of author self-citations (31). The idea is that focusingon narrower topics results in a contraction of the scientiﬁc community of reference. Thus,the number of citable papers would diminish and the chances for author self-citation wouldcorrespondingly increase, generating also the growth of the country self-citations. Actually,no evidence can be showed that directly falsiﬁes the specialization explanation. Nonetheless,this explanation appears implausible because it would imply that Italian researchers in allﬁelds have suddenly redirected their focus to topics mainly investigated in the nationalcommunity. This changing behavior would be not only peculiar of Italy, but also so strongto lead Italy to diverge from the other G10 countries in terms of inwardness. Notably, Figure3 shows that the post-2008 acceleration is visible in most of the research areas in Italy. Notonly the change in the behavior has been generalized, regarding most of the ﬁelds of research,but in some ﬁelds, such as engineering (ENG), mathematics (MAT) or veterinary (VET),the increase reached outstanding proportions. In any case, it would still be necessary toexplain why specialization occurred only in Italy and at the same time as the adoption ofnew rules for evaluation. Conclusion

In this paper, we contributed to the empirical study of the constitutive eﬀects that indicator-based research evaluation systems have on the behavior of the evaluated researchers. Byfocusing on the Italian case, we investigated how the Italian scientiﬁc community responded, at the national level , to the introduction of a research evaluation system, in which bibliomet-ric indicators play a crucial role. Our results show that the behavior of Italian researchershas indeed changed after the introduction of the evaluation system following the 2010 uni-versity reform. Such a change is visible at a national scale in most of the scientiﬁc ﬁelds . Weexplained this as the result of the pervasively adoption of strategic citation behaviors withinthe Italian scientiﬁc community. In particular, the inwardness indicator was able to trackthe eﬀects of two types of citation strategies: the opportunistic use of author self-citationand the creation of citation clubs exchanging citations between their members. Even iffurther research is needed to assess the respective weight of these two strategies, it is theirjoint presence that best explains the peculiar trend of the Italian inwardness, exhibiting aneat acceleration after 2010.In sum, the comparative analysis of the inwardness indicator showed that Italian researchgrew in insularity in the years after the adoption of the new rules of evaluation. Indeed,results show that, both globally and for many research ﬁelds, while the level of internationalcollaboration remained stable and comparatively low, the research produced in the countrytended to be increasingly cited by papers authored by at least an Italian scholar. Put in otherwords: the share of citations to Italian articles received by articles authored by non-Italianauthors sharply decreased after 2010.We believe that three main lessons can be derived from the Italian case. Firstly, ourresults conﬁrm that scientists are quickly responsive to the system of incentives in whichthey act (31). Thus, any policy aiming at introducing or modifying such a system shouldbe designed and implemented very carefully. In particular, considerable attention shouldbe placed on the constitutive eﬀects of bibliometric indicators. They are not neutral mea-sures of performance but actively interact and quickly shape the behavior of the evaluatedresearchers.Secondly, our results show that the «responsible use» of metrics would not be enoughto prevent the emergence of strategic behaviors. For instance, the Leiden Manifesto recom-mends the use of a «suite of indicators» instead of a single one as a way to prevent gamingand goal displacement (see the principle number 9 in (12)). The Italian case shows that,even if the researchers are evaluated against multiple indicators, as recommended, strategicbehaviors manifest themselves anyway.Lastly, our results prompt some reﬂections on the viability of the mixed evaluation sys-tems, in which the indicators are intended for complementing or integrating the expertjudgment expressed by the peer review. In fact, the Italian system was designed in principleaccording to such a mixed approach, both for the research assessment exercises where re-search products were evaluated by bibliometric indicators or by peer reviewers, and for theASN where to overcome bibliometric thresholds is but a necessary condition for being ad-mitted to the ﬁnal evaluation by habilitation committees. Nonetheless, our results show thatthe mere presence of bibliometric indicators in the evaluative procedures is enough to struc-turally aﬀect the behavior of the scientists, fostering opportunistic strategies. Therefore,there is the concrete risk that in mixed evaluation systems, the indicator-based component vercomes the peer review-based one. Hence, they de facto collapse to indicator-centricapproaches. We believe that further research is needed to better understand and fully ap-preciate the possibility of such a collapse. In the meantime, we suggest that policy makersshould exercise the most extreme caution in the use of indicators in science policy contexts. References

1. Hicks D (2012) Performance-based university research funding systems.

Research Policy

The changing governance of the sciences: the ad-vent of research evaluation systems (Springer, Dordrecht, the Netherlands).3. Haustein S, Larivière V (2015) The Use of Bibliometrics for Assessing Research: Pos-sibilities, Limitations and Adverse Eﬀects.

Incentives and Performance , eds Welpe IM,Wollersheim J, Ringelhan S, Osterloh M (Springer International Publishing, Cham), pp121–139.4. Moed HF (2005)

Citation analysis in research evaluation (Springer, Dordrecht).5. Geuna A, Martin BR (2003) University Research Evaluation and Funding: An Inter-national Comparison.

Minerva

Scientometrics

Overview of models of performance-based research funding systems.Performance-Based Funding for Public Research in Tertiary Education Institutions (OECD),pp 23–52.8. Edwards MA, Roy S (2017) Academic Research in the 21st Century: Maintaining Sci-entiﬁc Integrity in a Climate of Perverse Incentives and Hypercompetition.

EnvironmentalEngineering Science

Re-search Evaluation

Handbookof Quantitative Science and Technology Research , eds Moed HF, Glänzel W, Schmoch U(Springer, Dordrecht), pp 389–405.11. Biagioli M (2016) Watch out for cheats in citation game.

Nature

3. Rijcke S de, Wouters PF, Rushforth AD, Franssen TP, Hammarfelt B (2016) Eval-uation practices and eﬀects of indicator use—a literature review.

Research Evaluation

Journal of Informetrics

Public Management Review

KNOW: A Journal on the Formation of Knowledge

Knowing and living in academic research: convergencesand heterogeneity in research cultures in the European context (Institute of Sociology of theAcademy of Sciences of the Czech Republic, Prague).18. Müller R, de Rijcke S (2017) Thinking with indicators. Exploring the epistemic impactsof academic performance indicators in the life sciences.

Research Evaluation

Reforming Higher Education , eds Musselin C, TeixeiraPN (Springer Netherlands, Dordrecht), pp 65–80.21. Fochler M, Felt U, Müller R (2016) Unsustainable Growth, Hyper-Competition, andWorth in Life Science Research: Narrowing Evaluative Repertoires in Doctoral and Post-doctoral Scientists’ Work and Lives.

Minerva

How should research be organised? (College Publications, London).23. Laudel G, Gläser J (2014) Beyond breakthrough research: Epistemic properties ofresearch and their consequences for research funding.

Research Policy

Cambridge Journal of Economics

European journal of analytic philosophy

Aslib Journal of Information Management r-perish culture: A worldwide survey. Journal of the American Society for InformationScience and Technology

CambridgeJournal of Education

Scientometrics

Performance-based incentives, research evaluation systems and thetrickle-down of bad science

Research Policy

Scientometrics

CEES Occasional PaperSeries

Nature

Frontiers in Physics

4. doi:10.3389/fphy.2016.00049.36. Glänzel W, Bart T, Balázs S (2004) A bibliometric approach to the role of authorself-citations in scientiﬁc communication.

Scientometrics

Jour-nal of Information Science

Scientometrics

Research Metrics Guidebook . Available at: https://tinyurl.com/y2lq8f6340. Tagliacozzo R (1977) Self-Citations in Scientiﬁc Literature.

Journal of Documenta-tion

Scientometrics

2. Garﬁeld E (1979) Is citation analysis a legitimate evaluation tool?

Scientometrics

Journal of the American Society for Information Science and Technology

Science

Science andPublic Policy

The sociology of science: theoretical and empirical investigations (Univ. of Chicago Pr, Chicago).47. Kaplan N (1965) The norms of citation behavior. Prolegomena to the footnote.

Amer-ican Documentation

Scientometrics

Journal of the American Society for InformationScience and Technology

Journal of the American Society for Information Science and Technology

Scientometrics

Displaying time series, spatial, and space-time datawith R (CRC Press, Taylor & Francis Group, Boca Raton, FL).53. Garﬁeld E (1972) Citation Analysis as a Tool in Journal Evaluation. Journals can beranked by frequency and impact of citations for science policy studies. 178(4060):471–479.54. Nederhof AJ (2006) Bibliometric monitoring of research performance in the SocialSciences and the Humanities: A Review.

Scientometrics66(1):81–100.