[PDF] A Disciplinary View of Changes in Publications' Reference Lists After Peer Review

Abstract

This paper provides insight into the changes manuscripts undergo during peer review, the potential reasons for these changes, and the differences between scientific fields. A growing body of literature is assessing the effect of peer review on manuscripts, however much of this research currently focuses on the social and medical sciences. We matched more than 6,000 preprint-publication pairs across multiple fields and quantified the changes in their reference lists. We also quantified the change in references per full-text section for 565 pairs from PLOS journals. In addition, we conducted manual checks of a randomly chosen sample of 98 pairs to validate our results, and undertook a qualitative analysis based on the context of the reference to investigate the potential reasons for reference changes. We found 10 disciplines, mostly in the natural sciences with high levels of removed references. Methods sections undergo the most relative change in the natural sciences, while in the medical and health sciences, the results and discussion sections underwent the most changes. Our qualitative analysis identified issues with our results due to incomplete preprint reference lists. In addition, we deduced 10 themes for changing references during peer review. This analysis suggested that manuscripts in the natural and medical sciences undergo more extensive reframing of the literature used to situate and interpret the results of studies than the social and agricultural sciences, which are further embedded in the existing literature through peer review. Peer review in engineering tends to focus on methodological details. Our results are useful to the body of literature examining the effectiveness of peer review in fulfilling its intended purposes.

Full PDF

AA Disciplinary View of Changes in Publications’ Reference ListsAfter Peer Review

Aliakbar Akbaritabar ∗ Dimity Stephen † Abstract

This paper provides insight into the changes manuscripts undergo during peer review, the potentialreasons for these changes, and the diﬀerences between scientiﬁc ﬁelds. A growing body of literature isassessing the eﬀect of peer review on manuscripts, however much of this research currently focuses onthe social and medical sciences. We matched more than 6,000 preprint-publication pairs across multipleﬁelds and quantiﬁed the changes in their reference lists. We also quantiﬁed the change in references perfull-text section for 565 pairs from PLOS journals. In addition, we conducted manual checks of a randomlychosen sample of 98 pairs to validate our results, and undertook a qualitative analysis based on thecontext of the reference to investigate the potential reasons for reference changes. We found 10 disciplines,mostly in the natural sciences with high levels of removed references. Methods sections undergo the mostrelative change in the natural sciences, while in the medical and health sciences, the results and discussionsections underwent the most changes. Our qualitative analysis identiﬁed issues with our results due toincomplete preprint reference lists. In addition, we deduced 10 themes for changing references duringpeer review. This analysis suggested that manuscripts in the natural and medical sciences undergo moreextensive reframing of the literature used to situate and interpret the results of studies than the socialand agricultural sciences, which are further embedded in the existing literature through peer review. Peerreview in engineering tends to focus on methodological details. Our results are useful to the body ofliterature examining the eﬀectiveness of peer review in fulﬁlling its intended purposes.

Keywords

Introduction

Academic publishing is a social process. Academics do not develop their ideas in isolation, but draw on priorwork from other members of the academic community to formulate hypotheses and devise studies. Mostof the time, academics then cite these prior works when writing manuscripts to build their arguments anddemonstrate where their study belongs in the larger body of academic literature. On the one hand, academicsuse citations to persuade other members of the academic community of the soundness of their claims (Gilbert,1977), to agree with previous literature, or to note their diﬀerences and potential disagreements (Bruggeman,Traag, & Uitermark, 2012; Murray, 2020). On the other hand, there is evidence that social processes aﬀectwhich studies are cited beyond the academic merit of a study. Examples include higher rates of self-referenceof national colleagues (Khelfaoui, Larrègue, Larivière, & Gingras, 2020) or the tendency for the work of femaleacademics to receive fewer citations than that of their male peers (Maliniak, Powers, & Walter, 2013; Dworkinet al., 2020). Thus, the act of citing prior work is aﬀected by social factors, alongside considerations such asthe relevance of the work to the author’s current study and its academic merit. Cumulatively, these citingbehaviours can have career implications for academics via their inﬂuence on the impact of these previousstudies and the reputation of their authors and in the longer term provide a “ cumulative advantage ” for thepreviously more cited authors (Merton, 1968) in shape of an upward spiral. ∗ Max Planck Institute for Demographic Research (MPIDR), Laboratory of Digital and Computational Demography, Rostock,Germany; [email protected]; [email protected]; ORCID = 0000-0003-3828-1533 (Corresponding Author) † German Centre for Higher Education Research and Science Studies (DZHW), Berlin, Germany; [email protected]; ORCID= 0000-0002-7787-6081 a r X i v : . [ c s . D L ] F e b urthermore, although academics may author a publication, they do not solely determine a publication’scontent. Journal editors and peer reviewers also play key and distinct roles in inﬂuencing the content ofmanuscripts through the academic publication process. Editors screen submitted manuscripts for relevanceto the journal’s topical orientation and evidence of misconduct, before tasking peer reviewers with providingtheir expert opinions on the manuscript’s soundness and contribution to existing knowledge, then applyingthese insights in a ﬁnal decision about publishing the manuscript (Hirschauer, 2010). Editors and peerreviewers are individuals working in the same academic system, likely as academics themselves, and areaﬀected by similar factors shaping citing behaviour (Teplitskiy, Acuna, Elamrani-Raoult, Körding, & Evans,2018). There are social processes at work to determine editorial board memberships (Miniaci & Pezzoni,2020). Editors and reviewers have speciﬁc goals, ideas and values of “ an ideal scientiﬁc contribution ”, andeditors evaluate academic manuscripts submitted to their journals in the light of these ideals (Hengel, 2017).In a larger scope, this empowers particular academic schools of thought through editorial decisions andstrategies (Teplitskiy et al., 2018). It fosters a mainstream line of thought that can in turn penalize noveltyand innovative academic work (Hofstra et al., 2020) that would be disruptive to the larger body of literatureaccepted by a given subset of the community (Wu, Wang, & Evans, 2017). Thus, one or multiple normalizingprocesses are at work in shaping what subjects should be studied, which prior works should be cited, and howand in which form academic ideas should be expressed to be accepted for publication in mainstream outlets.Clearly then, the peer review process is a major cornerstone of the academic system. More formally,peer review is broadly agreed upon by the academic community to serve two functions: it should improvethe quality of a manuscript as a communication tool, and it should screen studies for academic rigour toidentify their shortcomings so that they may be addressed, or reject those studies that are irretrievably ﬂawed(De Vries, Marschall, & Stein, 2009). Nevertheless, peer review could play other less explicit roles for theacademic system beyond these two main functions (Hirschauer, 2010). In the context of a growing bodyof research questioning the eﬀectiveness of peer review in performing these two functions, several studieshave investigated the eﬀect peer review has on manuscripts. These have been conducted through diversemeans, such as surveys of authors’ experiences with changes requested by peer reviewers (Strang & Siler,2015), comparing text and referencing changes in manuscripts before and after peer review (Teplitskiy, 2016;Klein, Broadwell, Farb, & Grappone, 2019), assessing post-review manuscripts on, for instance, readability(Hengel, 2017), adherence with reporting guidelines (Hopewell et al., 2014; Carneiro et al., 2020), or multipleaspects of quality as assessed via checklists (Goodman, Berlin, Fletcher, & Fletcher, 1994; Roberts, Fletcher,& Fletcher, 1994) or judged by readers and journal editors (Jeﬀerson, Wager, & Davidoﬀ, 2002).Many of these studies of changes resulting from peer review have largely focused on the social andmedical sciences. Given the diﬀerent methods and foci of disciplines, the aim of this study was to understandhow the peer review process inﬂuences manuscripts in diﬀerent disciplines, as measured through changes inreference lists and the sections most altered between the submitted and published versions. The paper isstructured as follows: we ﬁrst describe previous literature examining the eﬀects of peer review on manuscripts.We then present our data and analytical strategy in methods section, followed by presentation of our results.Finally, we contextualise our ﬁndings in relation to previous studies, draw conclusions and discuss the broaderimpact of our results for the academic system. Background

Here we ﬁrst introduce prior studies that had a large scale and quantitative focus on peer review as a ﬁeldof research. We look also into disciplinary diﬀerences in editorial practices. In addition, we summarise theﬁndings of the studies we identiﬁed that have assessed the eﬀects of peer review on changes in content ofmanuscripts.Batagelj, Ferligoj, & Squazzoni (2017) present a quantitative study of all publications indexed in Webof Science with peer review or refereeing as their subject (e.g., about 23,000 records). They conclude thata new ﬁeld of research has emerged, i.e., studies of peer review. Using citation networks and main pathanalysis, they identiﬁed 47 publications that can be considered as the most inﬂuential studies in this ﬁeld.They emphasized on three historical periods where social sciences, biomedical journals and more recentlyspecialist journals of science studies have dominated the studies of peer review.2iniaci & Pezzoni (2020) investigated factors that can aﬀect researchers’ chances for membership ineditorial boards of economics journals. They found that merit and academic productivity play a role indetermining who is selected as a member of the editorial boards, but, after controlling for all merit basedfactors, they found that there is a social process at work. Former colleagues, proteges and co-authors of theeditor in chief have higher chances to be elected as prospective editorial board members.Hyland & Jiang (2020) studied 850 review note extracts posted on a website from 2014 to mid-2019which were extremely critical of the reviewed manuscripts and rough in the used language by the reviewers.They highlight the fact that although peer review is a cornerstone of the academic system, still, there is needfor editors to control the process and prevent the detrimental aspects of non-constructive reviews. In additionPranić, Malički, Marušić, Mehmani, & Marušić (2020) found that authors are more satisﬁed with constructivereview notes and editors think these constructive reviews better guide the editorial decision-making process.Casnici, Grimaldo, Gilbert, & Squazzoni (2017) studied the review data on 915 submissions to amulti-disciplinary journal. They found that reviewers from diﬀerent disciplinary backgrounds have diﬀeringbehaviours in their review and they write diﬀerent review notes. Junior reviewers took less time and usedkinder language towards the authors. Nevertheless, they found that multi-disciplinary journals can establishagreements and evaluation standards on good academic work which requires extensive eﬀorts from the editorialboard members and editor in chiefs of journals. This will in turn nurture more impactful multi-disciplinaryresearch instead of suppressing it.Recent studies of the eﬀect of peer review on manuscripts suggests that peer review, at least in the socialsciences, disproportionately focuses on its ﬁrst function – critically analysing the theoretical framing of studies– rather than the second function of ensuring the studies are methodologically valid (Strang & Siler, 2015;Teplitskiy, 2016). Strang & Siler (2015), using a qualitative analysis of articles published in AdministrativeScience Quarterly between 2005 and 2009 and supported by a survey of the authors, found that the sections ofthe articles pertaining to the theoretical framing of studies and the interpretation of results were the sectionsmost altered during peer review, while methodological sections were largely unchanged. Papers’ referencelists grew on average by 26% between submitted and published versions, however this was usually not simplya case of adding citations. Reference lists typically underwent extensive change in line with the degree ofcriticism around framing and interpretation during review, with references removed and added based on thechange in interpretation. Teplitskiy (2016) compared manuscripts in quantitative sociology before and afterpeer review and similarly found that the manuscripts predominantly changed in their theoretical framingof the study, rather than in the methodology or results. These changes were applied to embed results in atheoretical framework and convey a better relation to the body of literature in sociology (Teplitskiy, 2016).However, this focus on theoretical reframing has the potential to redirect authors’ time in reframing studiesthat are otherwise methodologically sound and acceptable for publishing, and to homogenise research in linewith the ﬁeld’s accepted frameworks or “schools of thought” (Teplitskiy et al., 2018).In comparison to these studies from the social sciences, an analysis of arXiv and bioRxiv preprints andtheir published versions, which were predominately in the physics and biology disciplines, found there wasvery little change in the titles, abstracts or manuscript text between versions. Around 80% of body sectionscompared between physics preprints and manuscripts were identical or nearly so, and there was only slightlymore variability in biology manuscripts (Klein et al., 2019). This suggests that peer review in these disciplinesdoes not have the same focus on theoretical reframing as the social sciences, and indeed perhaps does notsubstantially add to manuscripts at all, though it may control the methodological soundness. Note that thesepreprint servers might host speciﬁc disciplinary manuscripts and thus conclusions of the studies using onlyone of these servers might not be generalizable to other disciplines. Furthermore, depositing a preprint on aserver prior to publication is highly skewed toward speciﬁc disciplines and there are still disciplinary journalsthat do not allow authrs to post preprint versions of manuscripts submitted to their journal.Another series of studies have examined the eﬀect of peer review on manuscripts in the medical sciences.Goodman et al. (1994) compared pre- and post-peer review medical manuscripts using a checklist of 34 itemsadministered by physicians. They found that peer review modestly improved overall scores of manuscriptquality, and 5 individual items signiﬁcantly improved, pertaining to the generalisability and certainty ofresults, and the weight authors gave to the results. Another study also found that the readability of medical3anuscripts was slightly improved with peer review, and the median length of articles increased by 2.6%(Roberts et al., 1994). More recently Hopewell et al. (2014) found in a study examining adherence toCONSORT reporting guidelines for clinical trials that peer review was ineﬀective in detecting deﬁcientreporting of methods and results. This may have extended from the tendency for ensuring adherence toreporting guidelines to fall within the editor’s realm of responsibility, however the peer review process overallgenerally prompted only a small number of changes in manuscripts. Further, although most changes improvedthe quality of reporting, there was also a number of cases where review negatively inﬂuenced manuscripts byadding unplanned analyses. Carneiro et al. (2020) similarly found peer review marginally increased reportingquality in their sample of bioRxiv preprints and associated publications, but also that 27% of their pairsdecreased in reporting quality between versions. Together these studies suggest peer review has mixed eﬀectson medical manuscripts, but generally slightly improves their content.In addition to changes suggested by well-intentioned reviewers, unethical review practices can alsoinﬂuence manuscripts under development. Such practices include coercive citations wherein the editor orreviewer unnecessarily requests citations to their own work or work from the journal in which the manuscriptis under review. Coercive citation is not an uncommon practice by both reviewers and editors. Wilhite& Fong (2012) found that 20% of nearly 7,000 social scientists had been coerced to add citations to theirmanuscripts and more than 20% of journals in these ﬁelds engaged in coercive practices. Meanwhile, nearlya third of all citations that reviewers recommended were to their own works, and requests for citing thereviewer occurred more often when the reviewer recommended the manuscript was accepted or revised thanrejected (Thombs et al., 2015). Although, in a much larger sample of nearly 55,000 reviewers, Baas & Fennell(2019) recorded only 0.79% engaged in citation manipulation. As such, reference changes during peer reviewmay reﬂect a certain degree of unscrupulous peer review practices, in addition to legitimate revisions of themanuscripts.As the majority of research into the eﬀect of peer review on manuscripts has occurred in the social andmedical sciences and academic disciplines have varying methods and foci, in this study we investigate howpeer review inﬂuences manuscripts and whether there are identiﬁable diﬀerences between disciplines in theeﬀect of peer review. To do this, we match preprints from bibliometric databases to their published form in asample of more than 6,000 pairs, and then quantify the changes made to reference lists during peer reviewacross multiple disciplines. Further, we identify the sections of publications that undergo the most extensivereferencing changes during peer review. This broad examination enables us to identify whether there are anydisciplinary eﬀects in changes to manuscripts during peer review, and what those diﬀerences are. In addition,we carry out an extensive gold standard validation of preprint-publication pairs through manual controlsof the reference list changes and, via a qualitative analysis of the text surrounding referencing changes, wereﬂect on the potential reasons for the observed changes.We acknowledge there are limitations to using references as the measurement of the eﬀect of peer review.For instance, changes may be made to the manuscript which do not aﬀect referencing and these changes willnot be captured through our method. However, Strang & Siler (2015) observed moderate correlations betweenthe intensity of peer review critiques and the extent of bibliographic changes made between manuscriptversions, with more intense criticism typically associated with greater bibliographic change ( r = Methods

One of the main obstacles hindering studies on reference list changes is the lack of bibliometric databasesindexing preprints as a publication type. Dimensions is one of the most recent databases that provides thistype of metadata on a large scale. Thus, we matched preprints from the Dimensions database (Dimensions4esources, 2019) to subsequent publications in journals. We adopted three methodological strategies here:1) a broad descriptive and quantitative view of all publications and changes in reference lists, 2) a moreﬁne-grained view of publications in PLOS journals with a further probe into references’ location in thefull-text, in addition to reference list changes and 3) a manual gold standard analysis of a subset of our datato assess the robustness of our conclusions.As shown in Figure 1, we ﬁrst identiﬁed all preprints indexed in Dimensions using the in-house databaseof 26 th of April 2019 maintained by the German Competence Centre for Bibliometrics (KB) . Dimensions includes preprints from popular servers such as arXiv , medRxiv , bioRxiv , SSRN and OSF , to name afew. There are a few notable limitations to Dimensions: Dimensions covers only a subset of all preprints thatare deposited in preprint servers and its coverage of reference lists is not complete. In addition, Dimensions,like other bibliometric databases, has a biased coverage of diﬀerent disciplines and lower coverage for socialsciences and humanities has been observed. Bibliometric databases usually merge the preprint records (ifthey cover any) with the published version of the same record, which makes accessing and tracking changesin the preprint versus published version’s reference lists a diﬃcult task. Further, if there are items that arenot yet disambiguated in the Dimensions database, they are not included in the reference lists, which aﬀectsboth sides of the comparison (i.e., both preprints’ and published items’ reference lists).Our search of Dimensions returned 373,563 preprints deposited to preprint servers between 2000 and2018, of which 25,032 had reference lists. We then matched these preprints to publications indexed in theKB’s in-house database of Clarivate’s Web of Science (WOS). We matched preprints to publications publishedin the same year or subsequent two years based on the Jaro Winkler distance between titles with a thresholdof more than 80% similarity for acceptable matches. Matching was conducted using the stringdist package inR (Loo et al., 2020), and identiﬁed a total of 2,986 pairs. We used a window of two years after the preprint’srelease year to reduce the likelihood of false matches, given that 90% of bioRxiv preprints are publishedwithin one year (Abdill & Blekhman, 2019), and to reduce the computational burden of the matching process.We considered titles a viable matching variable as prior studies have found titles to be very stable betweenpreprint and publication versions (Klein et al., 2019).To complement our data, we also included a set of 3,038 preprint and publication pairs identiﬁed andmade publicly available by Fraser, Momeni, Mayr, & Peters (2019). They matched preprints submitted tobioRxiv between November 2013 and December 2017 to publications from Crossref, Scopus, and mentionedin bioRxiv publication notices by i) querying DOIs in Crossref’s “relationship” property via the API, ii) byscraping the bioRxiv website for publication notices, and iii) by fuzzy matching to Scopus publications basedon author names, title, and the ﬁrst 100 characters of the abstract, using the Jaro Winkler distance with athreshold of more than 80% similarity (Fraser et al., 2019). In addition, complementing our corpus this wayextends our coverage from WOS to Scopus as another major source of bibliometric data.After matching the preprints to the published versions and obtaining the published DOI, we extractedthe reference lists of each published DOI from Dimensions. By using reference data for both preprints andpublications from Dimensions, we were able to take advantage of Dimensions’ internal identiﬁers and matchreferences between the preprint and publication references lists based on the Dimensions’ identiﬁers, ratherthan recreating this matching using metadata, which can be more error-prone. Once all reference data hadbeen compiled, we merged all publication sets for both sources and de-duplicated the dataset. We furthercontrolled for publication years (i.e., preprint year had to be equal to or smaller than publication year) andremoved four problematic pairs.We then confronted the preprint reference list against the published version. Following the methodologyused by Strang & Siler (2015), we divided the references in each preprint-publication pair into three groups: Public Library of Science, https://plos.org/ Kompetenzzentrum Bibliometrie (KB), http://bibliometrie.info https://app.dimensions.ai/discover/publication?or_facet_publication_type=preprint https://arxiv.org/ https://osf.io/preprints/ https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance tart from Dimensions preprintsHas reference list? (25,032 preprints)373,563 preprintsExclude from studyAnalyze references (added/ unchanged/ removed)Matches with Fraser 2018? NoIs published in PLOS journals?Yes 3,038 pairs Matches with WOS?NoYes 2,986 pairs NoYes YesExtract full text citations from PLOS journals' XMLYes 565 pairsExtract references list from DimensionsNo 5,459 pairsTotal of 6,024 pairs Figure 1: Preprint and publication matching process6 nchanged , added , and removed . “Unchanged” references were those that were included in both reference lists.References that were cited in the preprint but were not cited in the published version were “removed”, whilethe converse – references cited in the published version that were not cited in the preprint – were “added”references.We assume the set of references used in preprint and/or publication as the main corpora of literatureauthors use. Thus, in calculating proportion of changes in reference lists, we take these corpora as thebaseline and we do not use only the reference list of published version. For example, if a preprint had ﬁvereferences and two were removed, three were unchanged and one new reference was added, the proportionsare calculated based on the total six references in both preprint and publication. See Figure 2 for an example.Figure 2: Determining preprint and publication references’ change statusTo analyse the sections of publications that were most aﬀected by peer review, we used the full-textsof publications from PLOS journals. These full-texts are XML documents including hyperlinks to citedreferences within the documents which allowed us to track in which publication sections the references werecited. We downloaded a full corpus in November 2019 and extracted the full-text of publications that werematched to a preprint from WOS or Fraser et al. (2019) data using the published version’s DOI (a total of565 pairs out of our 6,024 pairs). Although PLOS suggests authors use a uniﬁed set of titles for manuscriptsections, in practice a diverse set of titles is used (e.g., methods , materials and methods , methods and materials , results , results and discussion , discussion , discussions , discussion and conclusions , conclusions ). We ﬁrststandardised the section names used to Introduction , Method , Results , Discussion and Conclusions , and

Supplementary Material and then identiﬁed which sections underwent the highest (lowest) changes frompreprint to published version based on the proportion of references added or unchanged. As we only hadaccess to full-text of publications in PLOS journals and not their preprints, our full-text section analysis islimited to added and unchanged references’ changes. In-text citations that were not clearly assigned to asection were allocated to an “

Unknown ” section.We investigated disciplinary diﬀerences on the basis of the OECD’s Fields of Science and Technology(FOS) classiﬁcation (OECD, 2007). The FOS is a two-level classiﬁcation comprised of 42 ﬁelds at the lower,more detailed level, which aggregate to six major ﬁelds:

Agricultural Sciences (AS),

Engineering Technology (ET),

Natural Sciences (NS),

Medical and Health Sciences (MHS),

Humanities (H) and

Social Sciences (SS).We present data on the FOS as a common classiﬁcation from which we could map data from each of oursources, which use diﬀerent discipline classiﬁcations. We used the native classiﬁcation from WOS for the pairs’published versions and then mapped each document to the FOS classiﬁcation based on a correspondence7rovided by Clarivate Analytics. Please note that some publications are assigned to multiple ﬁelds. Wepresent results using multiple counting of publications to each assigned discipline(s).To control the academic impact of the references added, removed and unchanged, we obtained thecumulative citations received by each reference up to three years after its publication from WOS. We chose this3-year threshold to allow suﬃcient time to elapse for accrued citations to be representative of the publication’simpact (Wang, 2013). Please note that citation counts are likely lower for more recent publications (e.g.,after 2016) as they have had less than three years to accrue citations. Note also that for the DOIs without amatch in WOS (due to diﬀerence in indexed records and coverage), the count of citations is considered 0.Finally, we standardised the names of journals of each preprint and publication references and matched themto the journal where the published version appeared to control for citations to the published journal.

Manual validation

Year Number of preprints Number of publications2002 12003 12007 3 22008 12009 1 12010 8 62011 4 62012 12013 14 112014 208 1422015 22 782016 74 132017 3,501 8232018 2,187 4,9302019 10

Results

Table 1 shows the yearly distribution of our 6,024 preprints and publications pairs, which is skewed towarddocuments after 2013. The reason is twofold: the data prepared by Fraser et al. (2019) spanned 2013-2017,and also Dimensions’ coverage of preprints and availability of reference lists were mainly focused on recentyears as preprint servers are relatively new (Dimensions Resources, 2019), which inﬂuenced our datasetand, as noted in Figure 1, we excluded preprints for which reference lists were unavailable in Dimensions.Approximately half (53%) of the preprints in our sample were published in the same year and 46% werepublished one year after the year they were released.Figure 3 compares the proportion of references added (green), removed (red) and unchanged (blue) byFOS disciplines. Natural sciences had the highest share of preprint-publication pairs in our sample, followedby medical and health sciences. In many disciplines, there were outlier publications with higher proportionsof removed references (e.g., 75% in some cases). But, the general trend observed in this Figure is that mostdisciplines have less than 25% as median of the removed references (indicated with the dot inside violins).However, there were ten speciﬁc disciplines with median removed references close to or more than 25% or insome cases close to 50% (e.g., Agriculture, forestry, ﬁsheries, Chemical sciences, Civil engineering, Electricaland electronic engineering, Materials engineering, Mechanical engineering, Nano-technology, Physical sciencesand astronomy, Social and economic geography, and Veterinary science). See Table 4 in Appendix section forthe exact frequency of pairs in each discipline and average number and proportion of reference list changes.It is important to note, as we will discuss in the manual gold standard results, Dimensions’ coverage ofpreprint references was much less reliable in contrast to publications’ references, thus the proportion of added references are substantially inﬂated here and part of these references would move to unchanged if Dimensionscoverage was complete, thus, they should be interpreted with caution. Note also that some ﬁelds did nothave any unchanged references which can further point out the problem of preprint reference list coverage inDimensions.Figure 4 presents the results of our probe into reference changes by manuscript section using full-textsfrom PLOS journals (top) and the ratio of added references to unchanged by full-text section (bottom). Inthe top panel, the colours each represent a section of the manuscript, while the dotted areas indicate theunchanged proportion, and the stripes are the added proportion. The proportions here are relative to allreferences in the single publication then aggregated over journals to allow comparison between sections andjournals. PLOS journals on the Y axis are sorted based on the decreasing number of pairs in our samplefrom top to bottom. PLOS Genetics, Biology, Computational Biology, and One all have a primary focus inthe natural sciences, while PLOS Pathogens is a mix of natural and medical sciences, and Neglected TropicalDiseases (NTD) is primarily a medical sciences journal. Taking an overall view, we see that in all sections andjournals (except the results section in PLOS Biology) the share of added references is higher than unchanged9 ociology Veterinary scienceOther engineeringand technologies Other naturalsciences Other socialsciences Physical sciencesand astronomy Political science Psychology Social and economicgeographyMaterialsengineering Mathematics Mechanicalengineering Media andcommunication Medical engineering Nano−technology Other agriculturalscienceEnvironmentalbiotechnology Environmentalengineering Health sciences History andarchaeology Industrialbiotechnology Languages andliterature LawCivil engineering Clinical medicine Computer andinformationsciences Earth and relatedenvironmentalsciences Economics andbusiness Educationalsciences Electrical eng,electronic engAgriculture,forestry, fisheries Animal and dairyscience Art Basic medicalresearch Biological sciences Chemicalengineering Chemical sciences P r opo r t i on o f c hange ( l og sc a l e ) proportions P_Added P_Removed P_Unchanged

Figure 3: Distribution of proportion of references added, removed or unchanged by discipline in WOS10nes. But, this diﬀerence is the highest in NTD, i.e., the medical science journal. Then looking at sections, wesee that in all 6 journals, the majority of bibliographic changes between versions occurred in the introductionsand discussions of the manuscripts, more than in the methodology or results sections.However, as the introduction and discussions are the sections used to frame the study in existingliterature, it’s more likely that references would be added here than in the methodology or results (Bertin,Atanassova, Gingras, & Larivière, 2016). As such, the bottom panel of Figure 4 shows the ratio of added tounchanged references in each section to account for the uneven distribution of references across a manuscript.These ratios show a diﬀerent picture of the changes within sections. In 3 of the 4 natural science journals(i.e., PLOS One, Computational Biology and Biology), the methods section underwent the most bibliographicchange, with around 2 times as many references added as unchanged. In NTD and Pathogens i.e., theones skewed more toward medical sciences, the results and discussion sections underwent the most changes.Genetics was somewhere in between having discussion section go through the most changes followed by themethods section. See Table 5 in Appendix section for the numbers behind this Figure.We then checked whether added, removed and unchanged references have a diﬀerent share of the journalwhere the published version of the pair has appeared. Figure 5 presents the results in terms of proportionof references based on change status to the published journal. Overall, the median references citing thepublishing journal in all FOS ﬁelds is rather low (dots inside violins) and peer review reduces at least a shareof references to the published journal (red violins). However, in engineering and technology, medical andhealth sciences, natural sciences and to a lesser extent agricultural sciences, most of the references to thepublished journal stayed unchanged (blue violins). In all ﬁelds there is still a share of added references (greenviolins) to the published journal. On the one hand, this shows a pre-emptive tendency among authors to citethe published journal before submission which then stays unchanged during the peer review. On the otherhand, in case of ﬁelds with highly specialised research themes, this might signal a narrow area of researchthat authors must cite to build their arguments based on few prior studies. In addition, we checked the meanand median of the citations that added, removed or unchanged references have accrued in the ﬁrst three yearsafter publication. While the median of the cited references was close to 10, the mean was closer to 100, wecould not identify signiﬁcant trends of disproportionately adding highly cited references.

Manual validation

Of the 125 pairs selected for manual validation, comparison of the preprint and publication titles conﬁrmedthat 113 (90.4%) were correct matches. After removing the 12 (9.6%) false matches and an additional 15pairs that were inaccessible due to paywalls or the removal of the preprint from its hosting repository, weanalysed 98 pairs. Based on the manual checks of the preprint and publications’ online reference lists againstthe Dimensions lists, we identiﬁed that, on average, 30.8% of references were missing from the Dimensionsreference lists of the 98 pairs. However, the reference lists were much more incomplete for preprints than theywere for publications: 3,197 of the 5,669 (56.4%) references in the preprints were missing from the Dimensionsdata, or on average 51.6% of references per preprint, compared to 604 of the 5,938 (10.2%) references inthe publications, or 10.0% missing per publication on average. References missing from the publication listswere more likely to be older publications, books, software, reports and non-English language documents,while missing references from preprints included these document types but also a large number of recent,English-language publications that were present in the publication reference list data. There was also a smallnumber (65, 1.0% of all examined references) of false positives in the Dimensions reference lists that were notin the online reference lists. These usually consisted of dual entries for, for instance, a book chapter and alsothe book itself, when only the chapter was actually referenced.Overall, for the 6,228 references considered across the 98 pairs, there were 495 (7.9%) added references,225 (3.6%) removed references, and 5,443 (87.4%) were unchanged between the preprint and publication. Theaverage per pair based on all relevant references was similar: on average, 7.7% of references were added, 3.3%were removed, and 89.0% were unchanged. In comparing these proportions to our main results, our validationprocess revealed that the large percentage of missing references in the preprint lists compared to publicationsinﬂated the number of “added” references in our main study. Table 2 shows the number and percentage ofreferences in the 98 pairs by their initial status assigned in the main study and their actual status assigned inthe validation process. We see here that, in our validation sample, 14.2% of references initially considered11 eglectedTropicalDiseasesBiologyPathogensGeneticsComputationalBiologyOne 0.00 0.25 0.50 0.75 1.00

Proportion P L O S J ou r na l s Full−textSections intromethod resultsdiscussion supplementsunknown

Change status

A U

NeglectedTropicalDiseasesBiologyPathogensGeneticsComputationalBiologyOne 0 2 4 6 8

Ratio added to unchanged P L O S J ou r na l s Full−textSections intro method results discussion

Figure 4: Distribution of references change in full text sections by PLOS journals (top) and added tounchanged proportion by section (bottom, U = unchanged, A = added)12 edical And Health Sciences Natural Sciences Social SciencesAgricultural Sciences Engineering And Technology Humanities P r opo r t i on o f c i t a t i on t o pub li s hed j ou r na l ( l og sc a l e ) proportions Added Removed Unchanged

Figure 5: Proportion of references to the published journal13able 2: Number and percentage of references in validation sample by initial status and actual status

Initial stauts Added (%) Removed (%) Unchanged (%) Incorrect (%) Total (%)Added 430 (14.2) 0 (0.0) 2,569 (84.6) 36 (1.2) 3,305 (100.0)Removed 0 (0.0) 87 (56.1) 47 (30.3) 21 (13.5) 155 (100.0)Unchanged 4 (0.2) 7 (0.3) 2,329 (99.2) 8 (0.3) 2,348 (100.0)Missing 61 (8.8) 131 (19.0) 498 (72.2) 0 (0.0) 690 (100.0)Total 495 (7.9) 225 (3.6) 5,443 (87.4) 65 (1.0) 6,228 (100.0) added were conﬁrmed to be so, however the majority (84.6%) were actually unchanged, which was driven bythe missing preprint references. Of the references initially considered removed, validation conﬁrmed 56.1%were accurate, while 30.6% were actually unchanged, and 13.6% were incorrect inclusions in the Dimensionslists. Unchanged references were most accurate, with 99.2% of references initially considered unchangedconﬁrmed during validation. However, the overall number of unchanged references is under-reported inthe initial status (2,348) compared to the actual status (5,443) due to the inaccurate inclusion of manyunchanged references as added or removed. Finally, of the references initially missing from both referencelists and identiﬁed during validation, the majority were unchanged (72.2%), 8.8% were added, and 19.0%were removed.Overall then, matching of preprint-publication pairs at the document level was 90% accurate. Duringvalidation, we detected no incorrect matches between references within a pair when matches were made,indicating the title-based matching process between reference lists was accurate. However, the extent ofmissing data in the preprint reference lists introduced a level of inaccuracy that means that results in themain study pertaining to added references should be interpreted cautiously as they are subject to inﬂation.Conversely, the main study under-reports unchanged references, however both unchanged and removedreference results are more reliable.Figure 6 compares the proportion of references added, removed and unchanged by FOS ﬁeld. At the topit shows the macro view based on our quantitative results and on the bottom are the results of our manualgold standard validation. The top panels show that the natural sciences and medical and health sciencesboth present a rather similar median of removed references, however there was a larger share of pairs withhigher removed proportions in the medical and health sciences. Agricultural sciences in particular showedhigher proportions of references being removed in the published version (median of 25%). In all ﬁelds, thereare outlier publications with higher proportion of removed references (e.g., 75% in some cases). However,based in the validation sample in the bottom panel, we see that the proportions of unchanged references inall ﬁelds is generally higher, and the proportions of added references much lower, than is reported in the toppanel. But in general, the distributions of unchanged and removed references in our macro study and goldstandard are in agreement.Figure 7 shows the distribution of added and removed references across document sections. Addedreferences relate to the section of the publication and removed references relate to the preprint section.“Other section” includes non-standardised sections, such as acknowledgements. We included references in eachsection in which they were cited, as such references may be counted more than once and the denominatorused for the percentage is the total number of references using multiple counting. Over one-third (35.5%, 82)of removed references and 1.7% (9) of added references could not be allocated to a section as, although theywere in the reference list of the preprint or publication respectively, they were not actually cited in the textof the document. This issue occurred in 17 preprints, but was condensed primarily in 4 that had between 9and 18 instances each.These results validate the pattern seen in the main study that the majority of references were added tothe introduction (30.0%) and discussion and conclusions sections (33.3%), with fewer changes in the methods(20.9%) and results sections (13.3%). The removed references here also follow this pattern, with the largestpercentages removed from the introduction (29.4%) and discussion and conclusions (17.7%) over the methods(9.5%) and results (6.1%). Excluding the 82 references only removed as they were inaccurately included in thepreprint reference list so that we may examine the distribution of “true” removals, nearly half of referenceswere removed from the introduction (45.6%), 27.5% from discussion and conclusions, 14.8% from methods,14 edical And Health Sciences Natural Sciences Social SciencesAgricultural Sciences Engineering And Technology Humanities P r opo r t i on o f c hange ( l og sc a l e ) Macro view

Natural Sciences Social Sciences

Change

AddedRemovedUnchangedAgricultural Sciences Engineering And Technology Medical And Health Sciences P r opo r t i on o f r e f e r en c e s ( l og sc a l e ) Gold standard

Figure 6: Distribution of references added, removed or unchanged in macro quantitative view (top) andvalidation sample (bottom) by ﬁeld 15nd 9.4% from results.

No sectionOther sectionSupplementDiscussion, conclusionsResultsMethodsIntroduction 0 10 20 30 40Percent of changed references

AddedRemoved

Figure 7: Distribution of added and changed references in the validation sample by document section.Based on the qualitative analysis, we identiﬁed 10 themes describing the apparent motives for changingreferences during peer review, 6 of which were applicable for removing references. The number and percentageof references changed in accordance with each theme by ﬁeld is shown in Table 3. Please note that, due tothe assignment of the same publications to multiple ﬁelds in the native discipline classiﬁcations, referencesmight be counted in more than one ﬁeld and as such the sum of the ﬁelds is greater than the total.The most common reason for removing references (35.5%) was the aforementioned referencing mistakesin the preprint. This was particularly a problem in the medical sciences and social sciences, where it accountedfor half and over three-quarters of removed references, respectively. Eleven percent of added references alsooccurred through referencing mistakes, however the majority of these (82%) still stemmed from the preprintas references cited in the preprint full-text were missing from the reference list and so appeared to be addedto the publication based on reference list information. The remaining 10 instances were due to referencingmistakes in the publication.There was also a series of referencing changes that reﬂected small updates between the preprint andpublication but did not result in notable changes in the content or structure of the manuscript. These includeadding references for previously uncited software or other tools used in studies (4.6%), which was particularlypronounced in engineering manuscripts (18.3%) compared to the other ﬁelds (1.2-10.3%). References were alsoadded to support knowledge claims that were made without supporting references in the preprint (5.1%). Thesocial and medical sciences had substantially fewer additions for this reason than the other ﬁelds (1.2-1.8%compared to 5.2-10.3%). There was also a tendency for references to be added (8.6%) or removed (27.7%)without making substantial changes, or in many cases any change at all, to the surrounding text. In mostof these cases (39, 60.9%), the reference was removed while other existing references were retained, or theremoved reference was replaced (20, 31.3%), while in only 5 (7.8%) cases was the reference removed so thatthe text was unsupported. When references were added without changing the text, the reference was mostoften added to text already supported by references (82% of cases), or added to replace the existing references(18%). This theme was the most common circumstance under which references were removed in all ﬁelds,particularly agricultural sciences and engineering where it accounted for more than half of all removals. Also,a small percentage of references were removed (6.9%) and added (1.3%) as authors updated references fromciting, for instance, a conference paper in the preprint to citing its published version in the publication, or16hanged a standard citation to a URL or other format not requiring an entry in the reference list and thusbeing “removed”. This was notably more common in engineering and the natural sciences than other ﬁelds.The remaining four themes were associated with notable changes to the manuscript’s structure orcontent between the preprint and published versions. The most common of these categories was providingadditional information for interpreting results. Of these 133 references, 111 were added when authors includedadditional text with supporting references to better relate and interpret their results in regard to existingliterature. The remaining 22 references in this category were added in sections addressing limitations or thesigniﬁcance of the study, which were missing from the preprint. This theme also accounted for the removalof 8.2% of references when authors reframed their studies’ results in comparison to existing literature. Itwas here that the social sciences had the largest percentage of references added (32.5%), and it was also akey area for the other ﬁelds. However, the small percentage of references removed compared to added in thesocial sciences (4.4%), and also the agricultural sciences and engineering ﬁelds, suggests these changes do notreﬂect a complete reframing of results but instead a more thorough contextualisation of the results withinthe existing literature. In comparison, there were nearly equal percentages of references added and removedin association with changes to interpreting results in the natural sciences, and an elevated percentage ofremoved references in the medical sciences also, suggesting there was a more extensive reframing of resultswithin the literature in these ﬁelds.In a related vein, the next most common theme was adding additional background information aboutthe study subject (24.7%). Similar to changes for interpreting results, the majority of additions in this theme(82) were added to better place the study within the existing literature, however these additions typicallysought to establish a theoretical or practical foundation for the study, rather than assist in interpreting theresults. A smaller portion were added to address a change in scope (30, largely in one study), and revisingknowledge claims. Authors also removed 16.5% of references in this revision process, often restructuring asection, most often the Introduction, and removing and adding references accordingly. This was the keytheme for adding references for the natural, medical, and agricultural sciences, accounting for more than 28%of added references in each ﬁeld. Once again the natural and medical sciences had higher levels of referencesremoved for this purpose, alongside the added references, suggesting manuscripts in these ﬁelds undergo moretransformation in the theoretical foundations of these studies than in other ﬁelds.A third theme pertained to changes in the study’s methodology, which accounted for 15.4% of addedreferences and 5.2% of removed references. Here, authors appeared to add references to provide missingor more extensive detail about data or processes (29 added references, 35.8%), to justify their selection ofparticular methods (17), such as demonstrating prior use for similar purposes, or to provide references forspeciﬁc statistics or methods described but unreferenced in the preprint (9). These changes all pertainedto the existing methodology, but another set of reference changes were triggered by changes in the study’smethodology. Twenty-six references were added in accordance with new analyses conducted between thepreprint and publication, and 12 references were removed as authors changed aspects of the study’s methodand removed details of the method and associated references (5 references), or tools or software no longer used(4 references), or moved a particular detail about the method to the supplementary information (3 references).Such methodological changes may have been requested through peer review or added of the authors’ ownaccord before submitting the manuscript to the journal. Engineering papers appeared to undergo the mostmethodological changes, with higher percentages of references added and, in particular, removed for thispurpose than in other ﬁelds.Authors added another small set of references (20, 3.8%) when discussing new suggestions for applicationsof the study’s ﬁndings and directions of future inquiry. Like additions made to address limitations ormethodological issues, these sections regarding future directions often appeared to have been speciﬁcallyrequested by reviewers as they frequently appeared as wholly new paragraphs sandwiched between sectionswith no change. However, this is our impression based on the changes between preprints and manuscripts, aswe did not have access to the reviewers’ reports to conﬁrm the requests. The percentage of references addedfor this purpose was relatively equal across ﬁelds. Finally, the “other” category included one reference forwhich we did not have access to the publication’s full-text so could not identify a reason for change, anda second reference which was a reminder left in the preprint from the author to themselves to cite theirin-review article, a task they completed in the publication. We have included in the Appendix the percentage17able 3: Number and percentage of references by thematic group and ﬁeld

Status Reason Social sci. (%) Natural sci.(%) Medical sci. (%) Agri. sci. (%) Engineering (%) Total (%)

Change in results interpretation 27 (32.5) 66 (21.3) 62 (28.1) 12 (20.7) 11 (15.5) 133 (25.3)Change in background framing 8 (9.6) 87 (28.1) 67 (30.3) 18 (31.0) 6 (8.5) 130 (24.7)Change in methodological details 10 (12.0) 52 (16.8) 27 (12.2) 8 (13.8) 13 (18.3) 81 (15.4)Referencing mistakes 28 (33.7) 24 (7.7) 32 (14.5) 1 (1.7) 16 (22.5) 57 (10.8)Change without modifying text 4 (4.8) 34 (11.0) 16 (7.2) 3 (5.2) 4 (5.6) 45 (8.6)Support for unsupported claims 1 (1.2) 16 (5.2) 4 (1.8) 6 (10.3) 4 (5.6) 27 (5.1)References for software, tools, etc 1 (1.2) 15 (4.8) 5 (2.3) 6 (10.3) 13 (18.3) 24 (4.6)Future directions 4 (4.8) 9 (2.9) 7 (3.2) 2 (3.4) 2 (2.8) 20 (3.8)Updated references 0 (0.0) 6 (1.9) 1 (0.5) 1 (1.7) 2 (2.8) 7 (1.3)Other 0 (0.0) 1 (0.3) 0 (0.0) 1 (1.7) 0 (0.0) 2 (0.4)

Added Total 83 (100.0) 310 (100.0) 221 (100.0) 58 (100.0) 71 (100.0) 526 (100.0)

Change in results interpretation 3 (4.4) 28 (20.6) 14 (10.9) 3 (5.2) 0 (0.0) 19 (8.2)Change in background framing 3 (4.4) 11 (8.1) 12 (9.4) 0 (0.0) 1 (2.1) 38 (16.5)Change in methodological details 0 (0.0) 8 (5,9) 3 (2,4) 4 (6,9) 8 (16,7) 12 (5.2)Updated references 2 (2,9) 16 (11,8) 3 (2,4) 4 (6,9) 8 (16,7) 16 (6.9)Change without modifying text 6 (8,8) 54 (39,7) 31 (24,6) 42 (72,4) 27 (56,3) 64 (27.7)Referencing mistakes 54 (79,4) 19 (14.0) 63 (50.0) 5 (8,6) 4 (8,3) 82 (35.5)

Removed Total 68 (100.0) 136 (100.0) 128 (100.0) 58 (100.0) 48 (100.0) 231 (100.0) of references removed and added in these themes by manuscript section.

Discussion

In this paper, we sought to understand how the peer review process inﬂuences manuscripts in diﬀerentdisciplines, as measured by changes in reference lists and the document sections most altered between thesubmitted and published versions. To do this, we matched more than 6,000 preprint-publication pairs acrossmultiple disciplines and quantiﬁed the changes in their reference lists. We also quantiﬁed the change inreferences per manuscript section for 565 pairs from PLOS journals. In addition, we conducted manual checksof a randomly chosen sample of 98 pairs to validate our results, and undertook a qualitative analysis basedon the context of the reference to oﬀer insight into the potential reasons for changing references.Our study is a contribution to the ﬁeld of studies of peer review identiﬁed by Batagelj et al. (2017). Weused a mixed methods approach to provide a quantitative, macro exploration and complement the resultswith a qualitative in-depth analysis. Although, we lack information about the motivations behind citingbehaviour of authors, nevertheless, our results oﬀer insight into how peer review, or possible preemptivemodiﬁcations authors might decide to apply, changes the reference lists of manuscripts.In our macro and quantitative investigation, we found ten speciﬁc disciplines, mainly from the naturalsciences, with a median of up to 25% or in some cases even 50% of references removed between the preprintand publication stages. Our more in-depth look at the full-text sections of publications in PLOS journalsshowed that, in the natural sciences, it is the methods section that undergoes the most changes while in themedical and health sciences, the results and discussion sections underwent the most changes. Furthermore,we found that publications pertaining to pure ﬁelds of science or disciplines, tend to undergo fewer changeswhile inter/multi-disciplinary publications (e.g., PLOS Genetics and Computational Biology) go through amixture of changes similar to that of the ﬁelds they bridge. We found a rather stable trend of authors citingthe journal in which the published version appeared that remained unchanged during peer review. This couldsignal preemptive behaviour of authors to cite prior works published in the journals they aim to publish in,or may reﬂect that the journal is a key source or perhaps one of only a few journals in a narrow, specialisedresearch area. Some diﬀerences between disciplines in the extent of removed references might reﬂect discipline-or journal-speciﬁc hesitance to accept preprints as reliable sources as they have not undergone peer reviewand so are removed by the request of peer reviewers or editors, although we cannot conﬁrm this withoutaccess to peer review comments sent to authors.Using our gold standard validation sample, we determined that the preprint-publication matching atthe document level was 90% accurate. However, when matching the reference lists between documents, wefound that preprint reference lists in Dimensions were often incomplete, with 57% of references missing,compared to 10% missing from the publication reference lists. This means the proportion of added referencesin the quantitative results is inﬂated and, based on the validation sample, 85% of added references were18ctually unchanged. Removed references were accurately identiﬁed in 56% of cases, while 30% were actuallyunchanged and 14% reﬂected inaccuracies in the Dimensions reference lists. Although, we accurately identiﬁedunchanged references 99% of the time, the results of the macro quantitative study under-represented theproportion of unchanged references as a number of unchanged references were inaccurately identiﬁed as addedor removed.Regarding the overall pattern of referencing changes, there was a notable stability in referencing betweenpreprints and publications across all ﬁelds. Nearly 90% of references present in the preprint were unchangedbetween versions, 8% were newly added during peer review, and only 4% were removed. Indeed, the mostcommon reason for removing references was because they were incorrectly included in the reference list in thepreprint and this was corrected in publication, suggesting the publication process improves the accuracy ofcitation practices. We identiﬁed nine additional themes under which bibliographic changes occurred duringpeer review. Four of these pertained to changes that did not result in substantial or indeed any changes tothe content or structure of the manuscript between versions, such as adding references for software, tools,or for claims that were unsupported in the preprint, updating references from conference papers or otherearly versions to publications, and changing references without changing the surrounding text. This latterpractice was particularly common for removing references and may perhaps relate to reductions to conform toword count limitations, as noted by Teplitskiy (2016). Authors adding references for these purposes suggestsreviewers have an important role in detecting unsupported claims and ensuring software, tools, and methodsare appropriately cited in publications.The bulk of referencing changes resulted from changes to the structure or content of the manuscript,including reframing the study’s placement or interpretation of its results within existing literature, addressingthe future directions of research, or changes in the methods’ description or use. These changes align with thepurposes of peer review to assess the methodological soundness of a study and improve the communication ofits results to the academic community (De Vries et al., 2009). However, we observed diﬀerences between ﬁeldsin the eﬀect of peer review. Peer review in engineering appears to be more oriented toward methodologicalsoundness, with most changes occurring in relation to methodological detail, citing software and tools, andthe interpretation of results, likely to correspond with the methodological changes. In the other ﬁelds, thefocus of peer review appeared to be on the theoretical framing of the study and its results. While the majorityof reference additions in all of these ﬁelds occurred in relation to reframing the study’s theoretical backgroundand results, we also observed higher levels of removing references for this purpose in the natural sciences andmedical sciences than in the social sciences and agricultural sciences. This suggests that natural and medicalscience publications might undergo more extensive reframing of studies, with substitution of references asfoundational and explanatory theories are exchanged, whereas studies in the social and agricultural sciencesappear to become more embedded in the ﬁeld, with references added but not removed. This reframing mayreﬂect how, as described by Teplitskiy (2016), theoretical framing is a negotiation between authors andreviewers regarding how studies and their results are interpreted. However, in the social sciences there isperhaps larger overlap in the theories applicable to results than in the natural and medical sciences, whereinterpretation under one theory may necessarily preclude interpretation under another, thus requiring theremoval of existing and addition of new references.Arguably, the predominant focus of reviewers on theory over methodology in most ﬁelds may reﬂectthe awareness reviewers, as academics themselves, have for the resource-intensive process of collecting andanalysing data that goes into producing a publication. For many reasons, such as funding, time, andavailability of data or equipment, a researcher usually cannot simply collect new data or run entirely newanalyses to address reviewers’ concerns, particularly should these concerns not pertain to fatal ﬂaws. As such,reviewers may focus on addressing the disconnection between the theoretical framing and methodologicalaspects of the study by suggesting reframing of the theory to align with the possible questions that can beanswered by the available data in a data-driven approach, rather than the converse of retaining the questionand adjusting the data in a question-driven approach (Teplitskiy, 2016). A future test of this hypothesiscould be the focus of peer review in grant applications, where both theoretical and methodological aspects ofa project are critiqued before any resources have been expended.Finally, our ﬁndings regarding the social sciences align somewhat with those of Strang & Siler (2015)and Teplitskiy (2016) in their studies of sociology. As they did, we also ﬁnd here that the key focus of peer19eview in the social sciences appears to be the theoretical framing of the study and its results. However,changes in methodological details were also the second most common reason for references to be added in thesocial sciences and at levels generally in line with the other ﬁelds, suggesting methodological soundness is nota neglected aspect of peer review in the social sciences.

Limitations and future directions

There are limitations to our study that are important to consider.While using references as a proxy for change in the section of a manuscript allows us to examine a largenumber of manuscripts (otherwise impossible with methods such as content analysis, e.g. Strang & Siler(2015)), the reliance on references might be ﬂawed. For instance, methodologies, particularly when innovative,may not include references, providing unreliable results about the extent to which they are critiqued andaddressed during peer review.Reliance on completeness of reference lists in each of the databases (hence the use of two databases, toincrease likelihood of completeness), which is particularly an issue for humanities and social sciences, meanswe could be missing references. However, our research question sought to examine whether a documentedproblem in the social sciences was also present in the hard sciences, and the hard sciences have a much morecomplete level of source references (Stephen, Stahlschmidt, & Hinze, 2020).We match references from Dimensions to the reference lists in WOS. In these matching procedures,there is a probability to lose some references and this might aﬀect the magnitude of observed trends andin reality they are more than presented here, so our results should be considered as the lower end of thecontinuum.Detecting a match between preprint and published version of publications is also a diﬃcult task. Wehave used multiple approaches, e.g., similarity between title of both sides while complementing it withcurrently existing pairs of DOIs (Fraser et al., 2019), but this still is prone to errors and could be improved.Our study is only considering the impact of peer review on the eventually published articles and leavesthe rejected ones out.Finally, authors of academic publications may often improve their manuscript beyond the advice givenby the reviewers. Thus the observed trends cannot be solely attributed to the peer review process and itcan signal change and improvements made by authors of their own accord themselves, hence we may haveoverestimated the eﬀect from peer review.Noting these limitations, our study oﬀers useful insight into the referencing changes manuscripts undergoduring peer review and the diﬀerences in the eﬀect of peer review on manuscripts in diﬀerent ﬁelds. Futurestudies could attempt to bridge the advantage of sample size and speciﬁcity in detection of changes by usingautomated textual analysis, such as is used for plagiarism detection software, which could potentially facilitatethe identiﬁcation of speciﬁc changes in the text of a large sample of manuscripts (Strang & Dokshin, 2019),and preclude the reliance on reference lists and the issues we encountered here. There are some Europeanlevel initiatives pursuing the goal of opening up peer review data (Squazzoni, Grimaldo, & Marušić, 2017;Squazzoni et al., 2020). We see that if those initiatives proceed, future research can beneﬁt and do moreﬁne-grained analysis of the changes that happen during the peer review from content changes and referencesuggestions by reviewers (e.g., analysis of review notes similar to Casnici et al. (2017)). Whether thosereferences that were removed were due to reducing the word count of the manuscript, could be an avenue forfuture studies by investigating journal guidelines for presence of strict word counts.

Conclusions

We conclude from our results that peer review does appear to function within its purposes of examining themethodological soundness of studies and improving manuscripts as tools to communicate academic ﬁndings.Despite a predominant focus on theoretical reframing in all ﬁelds but engineering, peer review also addressesmethodological issues in each ﬁeld. Further, this focus on theoretical framing, at least in the social andagricultural sciences, appears to serve to more thoroughly embed studies within their ﬁelds, emphasising the20ocus of reviewers on improving communication of the studies results, which was also evident in reviewers’apparent encouragement of authors to discuss the applications of their study’s ﬁndings and future researchdirections.

Acknowledgements

We would like to thank Fraser et al. (2019) for their publicly available data that we used to complementour dataset. We would like also to thank Martin Reinhart and Bahar Mehmani for helpful comments on anearlier version of this paper.

Funding Information

Data is obtained from Kompetenzzentrum Bibliometrie (Competence Centre for Bibliometrics), Germany,which is funded by Federal Ministry for Education and Research (BMBF), Germany, with grant number01PQ17001.

Data Availability

Micro publication level data cannot be made publicly available due to the licensing and contract terms of theoriginal data. However, contact authors for preprint-publication pair level data.

Declaration of competing interest

Authors declare that they do not have any conﬂict of interest.

Appendix

Table 4 presents the aggregated version of data behind the Figure 3 regarding number of preprint-publicationpairs, average number and average proportion of added, removed and unchanged references in FOS disciplines.Table 5 presents the data used in Figure 4. It presents number of preprint-publication pairs in eachPLOS journal and proportion of changes in references used in full-text sections.Figure 8 shows the percentage of references added and removed by theme and section of the manuscript.Percentages are of the total number of added or removed references per manuscript section.21able 4: Preprint and publication pairs by WOS disciplines while multiple counting (N = number of references,P = proportion of references)

FOS n_pairs N_Unchanged N_Removed N_Added P_Unchanged P_Removed P_AddedBiological sciences 2,514 27.18 6.07 31.22 0.47 0.09 0.51Other natural sciences 1,575 27.31 5.04 26.99 0.50 0.08 0.48Basic medical research 701 28.05 7.93 34.50 0.45 0.11 0.52Clinical medicine 430 24.51 10.19 32.46 0.42 0.17 0.55Health sciences 360 19.54 7.96 25.16 0.44 0.14 0.53Environmental biotechnology 341 21.46 5.05 22.76 0.49 0.10 0.48Psychology 166 27.73 7.81 37.39 0.43 0.12 0.55Computer and information sciences 152 19.16 5.19 15.95 0.54 0.13 0.45Mathematics 128 17.03 5.00 13.83 0.54 0.13 0.43Chemical sciences 83 16.35 14.10 35.67 0.33 0.27 0.63Earth and related environmental sciences 56 25.58 9.07 30.18 0.44 0.13 0.58Physical sciences and astronomy 40 23.84 16.44 32.49 0.44 0.28 0.57Medical engineering 36 21.23 10.50 35.14 0.39 0.17 0.59Agriculture, forestry, ﬁsheries 31 18.75 17.12 40.23 0.37 0.28 0.58Materials engineering 28 14.36 20.00 28.79 0.31 0.35 0.63Other engineering and technologies 26 14.47 12.12 30.81 0.39 0.24 0.60Other agricultural science 15 13.55 12.22 28.80 0.39 0.20 0.59Nano-technology 14 6.20 21.86 33.21 0.21 0.31 0.69Environmental engineering 12 17.29 10.88 33.17 0.32 0.18 0.70Veterinary science 11 15.60 26.17 35.91 0.29 0.37 0.61Social and economic geography 10 12.00 20.44 32.20 0.30 0.22 0.71Industrial biotechnology 9 27.38 8.25 39.11 0.41 0.10 0.59Sociology 9 19.43 9.17 36.78 0.34 0.17 0.62Economics and business 7 8.00 33.71 0.21 0.79Animal and dairy science 6 12.75 13.00 23.33 0.50 0.21 0.60Civil engineering 6 24.00 24.67 0.46 0.54Mechanical engineering 5 10.00 12.40 23.00 0.42 0.31 0.61Chemical engineering 4 2.00 28.00 41.25 0.03 0.60 0.68Languages and literature 4 32.75 2.67 31.25 0.48 0.04 0.48Educational sciences 3 35.00 14.33 19.33 0.62 0.31 0.49Electrical eng, electronic eng 3 54.00 16.67 24.67 0.66 0.31 0.47Law 2 9.00 27.00 0.24 0.76Media and communication 2 4.00 1.00 26.50 0.14 0.03 0.85Art 1 16.00 5.00 0.76 0.24History and archaeology 1 16.00 5.00 0.76 0.24Other social sciences 1 18.00 79.00 0.19 0.81Political science 1 6.00 12.00 0.33 0.67

Table 5: References change in full text sections by PLOS journals (U = unchanged, A = added)

PLOS journals n_pairs introA introU methodA methodU resultsA resultsU discussionA discussionU supplmentsA supplmentsU unknownA unknownUOne 306 0.23 0.15 0.09 0.05 0.05 0.04 0.19 0.11 0.00 0.00 0.06 0.03Computational Biology 94 0.22 0.09 0.11 0.04 0.15 0.07 0.16 0.07 0.01 0.00 0.06 0.02Genetics 83 0.16 0.09 0.12 0.06 0.15 0.09 0.16 0.07 0.01 0.01 0.05 0.03Pathogens 31 0.19 0.07 0.10 0.04 0.16 0.05 0.24 0.07 0.01 0.00 0.05 0.02Biology 30 0.16 0.10 0.09 0.04 0.10 0.11 0.20 0.12 0.00 0.00 0.05 0.03Neglected Tropical Diseases 21 0.24 0.08 0.15 0.05 0.08 0.01 0.26 0.08 0.00 0.00 0.04 0.02 o sectionOtherDiscussion, conclusionsResultsMethodsIntroduction ThemeFraming changeMethod changeInterpretation changeNo changeUpdated referenceReferencing mistakes

No sectionSupplementDiscussion, conclusionsResultsMethodsIntroduction

ThemeFraming changeMethod changeReference for toolsInterpretation changeFuture studyPreviously unsupportedNo changeUpdated referenceReferencing mistakes

Figure 8: The percentage of removed (top) or added (bottom) references per section by each thematic group.23 eferences

Abdill, R. J., & Blekhman, R. (2019). Meta-Research: Tracking the popularity and outcomes of all bioRxivpreprints. eLife , , e45133. https://doi.org/10.7554/eLife.45133Baas, J., & Fennell, C. (2019). When peer reviewers go rogue - Estimated prevalence of citationmanipulation by reviewers based on the citation patterns of 69,000 reviewers, 12.Batagelj, V., Ferligoj, A., & Squazzoni, F. (2017). The emergence of a ﬁeld: A network analysis ofresearch on peer review. Scientometrics , (1), 503–532. https://doi.org/10.1007/s11192-017-2522-8Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of referencesin scientiﬁc articles: The invariant distribution of references in scientiﬁc articles. Journal of the Associationfor Information Science and Technology , (1). https://doi.org/10.1002/asi.23367Bruggeman, J., Traag, V. A., & Uitermark, J. (2012). Detecting Communities through Network Data. American Sociological Review , (6), 1050–1063.Carneiro, C. F. D., Queiroz, V. G. S., Moulin, T. C., Carvalho, C. A. M., Haas, C. B., Rayêe, D., . . .Amaral, O. B. (2020). Comparing quality of reporting between preprints and peer-reviewed articles in thebiomedical literature. Research Integrity and Peer Review , . https://doi.org/10.1186/s41073-020-00101-3Casnici, N., Grimaldo, F., Gilbert, N., & Squazzoni, F. (2017). Attitudes of referees in a multidisciplinaryjournal: An empirical analysis. Journal of the Association for Information Science and Technology , (7),1763–1771. https://doi.org/10.1002/asi.23665De Vries, D. R., Marschall, E. A., & Stein, R. A. (2009). Exploring the peer review process: What is it,does it work, and can it be improved? Fisheries , (6), 270–279.Dimensions Resources. (2019). A Guide to the Dimensions Data Approach, 1945877 Bytes. https://doi.org/10.6084/M9.FIGSHARE.5783094Dworkin, J. D., Linn, K. A., Teich, E. G., Zurn, P., Shinohara, R. T., & Bassett, D. S. (2020). Theextent and drivers of gender imbalance in neuroscience reference lists. Nature Neuroscience , (8), 918–926.https://doi.org/10.1038/s41593-020-0658-yFraser, N., Momeni, F., Mayr, P., & Peters, I. (2019). The eﬀect of bioRxiv preprints on citations andaltmetrics. bioRxiv , 673665. https://doi.org/10.1101/673665Gilbert, N. (1977). Referencing as persuasion. Social Studies of Science , (1), 113–122.Goodman, S. N., Berlin, J., Fletcher, S. W., & Fletcher, R. H. (1994). Manuscript quality before andafter peer review and editing at annals of internal medicine. Annals of Internal Medicine , (1), 11–21.https://doi.org/10.7326/0003-4819-121-1-199407010-00003Hengel, E. (2017). Publishing while Female. Are women held to higher standards? Evidence from peerreview. (Working Paper). University of Cambridge. https://doi.org/10.17863/CAM.17548Hirschauer, S. (2010). Editorial judgments: A praxeology of ‘voting’in peer review.

Social Studies ofScience , (1), 71–103.Hofstra, B., Kulkarni, V. V., Munoz-Najar Galvez, S., He, B., Jurafsky, D., & McFarland, D. A. (2020).The DiversityInnovation Paradox in Science. Proceedings of the National Academy of Sciences , 201915378.https://doi.org/10.1073/pnas.1915378117Hopewell, S., Collins, G. S., Boutron, I., Yu, L.-M., Cook, J., Shanyinde, M., . . . Altman, D. G. (2014).Impact of peer review on reports of randomised trials published in open peer review journals: Retrospectivebefore and after study.

BMJ , (jul01 8), g4145–g4145. https://doi.org/10.1136/bmj.g4145Hyland, K., & Jiang, F. (. (2020). “This work is antithetical to the spirit of research”: An anatomy ofharsh peer reviews. Journal of English for Academic Purposes , , 100867. https://doi.org/10.1016/j.jeap.2020.100867 24eﬀerson, T., Wager, E., & Davidoﬀ, F. (2002). Measuring the quality of editorial peer review. JAMA , (21), 2786–2790. https://doi.org/10.1001/jama.287.21.2786Khelfaoui, M., Larrègue, J., Larivière, V., & Gingras, Y. (2020). Measuring national self-referencingpatterns of major science producers. Scientometrics , (2), 979–996. https://doi.org/10.1007/s11192-020-03381-0Klein, M., Broadwell, P., Farb, S. E., & Grappone, T. (2019). Comparing published scientiﬁc journalarticles to their pre-print versions. International Journal on Digital Libraries , (4), 335–350. https://doi.org/10.1007/s00799-018-0234-1Loo, M. van der, Laan, J. van der, Team, R. C., Logan, N., Muir, C., & Gruber, J. (2020, October).Stringdist: Approximate String Matching, Fuzzy Text Search, and String Distance Functions.Maliniak, D., Powers, R., & Walter, B. F. (2013). The Gender Citation Gap in International Relations. International Organization , (4), 889–922. https://doi.org/10.1017/S0020818313000209Merton, R. K. (1968). The matthew eﬀect in science: The reward and communication systems of scienceare considered. Science , (3810), 56–63.Miniaci, R., & Pezzoni, M. (2020). Social connections and editorship in economics. Canadian Journalof Economics/Revue Canadienne d’économique , n/a (n/a). https://doi.org/10.1111/caje.12460Murray, D. (2020, August). A scientometric analysis of disagreement in science. Dakota Murray

Learned Publishing .Roberts, J. C., Fletcher, R. H., & Fletcher, S. W. (1994). Eﬀects of peer review and editing onthe readability of articles published in annals of internal medicine.

JAMA , (2), 119–121. https://doi.org/10.1001/jama.1994.03520020045012Squazzoni, F., Ahrweiler, P., Barros, T., Bianchi, F., Birukou, A., Blom, H. J. J., . . . Willis, M. (2020).Unlock ways to share data on peer review. Nature , (7796), 512–514. https://doi.org/10.1038/d41586-020-00500-ySquazzoni, F., Grimaldo, F., & Marušić, A. (2017). Publishing: Journals could share peer-review data. Nature , (7658), 352. https://doi.org/10.1038/546352aStephen, D., Stahlschmidt, S., & Hinze, S. (2020). Performance and Structures of the German ScienceSystem 2020 . Studien zum deutschen Innovationssystem.Strang, D., & Dokshin, F. (2019). The production of managerial knowledge and organizational theory:New approaches to writing, producing and consuming theory (research in the sociology of organizations. InT. B. Zilber, J. M. Amis, & J. Mair (Eds.) (pp. 103–121). Emerald Publishing Limited.Strang, D., & Siler, K. (2015). Revising as Reframing: Original Submissions versus Published Papers inAdministrative Science Quarterly, 2005 to 2009.