[PDF] Frequently Co-cited Publications: Features and Kinetics

Abstract

Co-citation measurements can reveal the extent to which a concept representing a novel combination of existing ideas evolves towards a specialty. The strength of co-citation is represented by its frequency, which accumulates over time. Of interest is whether underlying features associated with the strength of co-citation can be identified. We use the proximal citation network for a given pair of articles (x, y) to compute theta, an a priori estimate of the probability of co-citation between x and y, prior to their first co-citation.Thus, low values for theta reflect pairs of articles for which co-citation is presumed less likely. We observe that co-citation frequencies are a composite of power-law and lognormal distributions, and that very high co-citation frequencies are more likely to be composed of pairs with low values of theta, reflecting the impact of a novel combination of ideas. Furthermore, we note that the occurrence of a direct citation between two members of a co-cited pair increases with co-citation frequency. Finally, we identify cases of frequently co-cited publications that accumulate co-citations after an extended period of dormancy.

Full PDF

FFrequently Co-cited Publications: Features and Kinetics

Sitaram Devarakonda , James Bradley , Dmitriy Korobskiy , Tandy Warnow , andGeorge Chacko ∗ Netelabs, NET ESolutions Corporation, McLean, VA Raymond Mason School of Business, Coll. of William & Mary, Williamsburg, VA Department of Computer Science, Univ. of Illinois, Urbana-Champaign, ILMay 12, 2020 ∗ [email protected] a r X i v : . [ c s . D L ] M a y bstract Co-citation measurements can reveal the extent to which a concept representing anovel combination of existing ideas evolves towards a specialty. The strength of co-citation is represented by its frequency, which accumulates over time. Of interest iswhether underlying features associated with the strength of co-citation can be identi-ﬁed. We use the proximal citation network for a given pair of articles ( x, y ) to compute θ , an a priori estimate of the probability of co-citation between x and y, prior to theirﬁrst co-citation.Thus, low values for θ reﬂect pairs of articles for which co-citation ispresumed less likely. We observe that co-citation frequencies are a composite of power-law and lognormal distributions, and that very high co-citation frequencies are morelikely to be composed of pairs with low values of θ , reﬂecting the impact of a novelcombination of ideas. Furthermore, we note that the occurrence of a direct citationbetween two members of a co-cited pair increases with co-citation frequency. Finally,we identify cases of frequently co-cited publications that accumulate co-citations afteran extended period of dormancy. Introduction

Co-citation, “the frequency with which two documents from the earlier literature are citedtogether in the later literature”, was ﬁrst described in 1973 [34, 50]. As noted by [50],co-citation patterns diﬀer from bibliographic coupling patterns [28] but align with pat-terns of direct citation and frequently co-cited publications must have high individual cita-tions.Co-citation has been the subject of further study and characterization, for example, compar-isons to bibliographic coupling and direct citation [6], the study of invisible colleges [24, 41],construction of networks by co-citation [52, 53], evaluation of clusters in combination withtextual analysis [8], textual similarity at the article and other levels [14], and the fractalnature of publications aggregated by co-citations [58].Co-citations provide details of the relationship between key (highly cited) ideas, and changesin co-citation patterns over time may provide insight into the mechanism with which newschools of thought develop. Implicit in the deﬁnition of co-citation is novel combinations ofexisting ideas, but only some frequently co-cited article pairs reﬂect surprising combinations.For example, two publications presenting the leading methods for the same computationalproblem may be highly co-cited, but this does not reﬂect a novel combination of ideas. Sim-ilarly, two publications describing methods that often constitute part of the same workﬂowmay be highly co-cited, but these co-citations are also not surprising. On the other hand,for two articles in diﬀerent ﬁelds, frequent co-citation is generally unexpected.Novel, atypical, or otherwise unusual combinations of co-cited articles have been exploredat the journal-level [65, 10, 7, 57]. However, journal-level classiﬁcations have limited res-2lution relative to article-level studies, which may better represent the actual structureand aggregations of the scientiﬁc literature [49, 29, 63, 37, 26]. Accordingly, we sought todiscover measurable characteristics of frequently co-cited publications from an article-levelperspective.To study frequently co-cited articles, we have developed a novel graph-theoretic approachthat reﬂects the citation neighborhood of a given pair of articles. In seeking to determinethe degree to which a co-cited pair of papers represented a surprising combination, wewished to avoid journal-based ﬁeld classiﬁcations, which present challenges. Instead, weattempted to use citation history to produce an estimate of the probability that a givenpair of publications ( x, y ) would be co-cited. Since we focus on the activity before theyare ﬁrst co-cited, the “probability" of co-citation is zero, by deﬁnition, since there are noco-citations yet. Hence, we approximated co-citation probabilities: we treat an articlethat cites one member of a co-cited pair and also cites at least one article that cites theother member as a proxy for co-citation. Speciﬁcally, given a pair of publications x, y , weconstruct a directed bipartite graph whose vertex set contains all publications that citeeither x or y previous to their ﬁrst co-citation. We then compute θ , a normalized countof such proxies, and use it to predict the probability of co-citation between x and y . Thisapproach enables an evaluation that is speciﬁc to the given pair of articles, and does sowithout substantial computational cost, while avoiding deﬁnitions of disciplines derivedfrom journals or having to measure disciplinary distances.To support our analysis, we constructed a dataset of articles from Scopus [19] that werepublished in the eleven year period, 1985-1995, and extracted the cited references in thesearticles. Recognizing that frequently co-cited publications must derive from highly-citedpublications [50], we identiﬁed those reference pairs (33.6 million pairs) for each article inthe dataset that are drawn from the top 1% most cited articles in Scopus and measuredtheir frequency of co-citation.To investigate which statistical distributions might best describe the co-citation frequenciesin these 33.6 million co-cited pairs, we reviewed prior work on distributions of citation fre-quency [47, 20, 44, 45, 40, 64, 55, 56, 48]. This research has ﬁt the frequency distributionof citation strength sometimes to a power law distribution and other times to a lognormaldistribution. A graph of the analogous co-citation data suggests that power law or log-normal distributions are candidates for describing co-citation strength as well and so we,accordingly, investigated that conjecture. Interestingly, [38] notes the debate between theappropriateness of power law versus lognormal distributions is not conﬁned to bibliometrics,but has been at issue in many disciplines and contexts.To study how the best-ﬁt distributional function and parameters for co-citation might varywith θ , we stratiﬁed co-citation frequency data. We also measured whether a direct linkexists between two members of a co-cited pair (i.e., whether one member of a pair citesthe other) and how this property is related to co-citation frequencies. We ﬁnd that the3istribution of co-citation frequencies varies with θ and that a power law distribution ﬁtsco-citation frequencies more often when θ is small, whereas a lognormal distribution ﬁtsmore often for large θ .A pertinent aspect of co-citation is the rate at which frequencies accumulate. While citationdynamics of individual publications have been fairly well studied by others, for example,[62, 20], the dynamics of co-cited articles are less well studied. Our interest was the specialcase analogous to the Sleeping Beauty phenomenon [59, 27], which may reﬂect delayedrecognition of scientiﬁc discovery and the causes attributed to it [35, 21, 22, 15, 2, 23]. Thus,we also identiﬁed co-cited pairs that featured a period of dormancy before accumulatingco-citations. Materials & Methods

Data

Citation counts were computed for all Scopus articles (88,639,980 records) updatedthrough December 2019, as implemented in the ERNIE project [30]. Records with corruptedor missing publication years or classiﬁed as ‘dummy’ by the vendor were then removed,resulting in a dataset of 76,572,284 publications. Hazen percentiles of citation counts,grouped by year of publication, were calculated for the these data [4]. The top 1% of highlycited publications from each year were combined into a set of highly cited publicationsconsisting of 768,993 publications.Publications of type ‘article’, each containing at least ﬁve cited references and published inthe 11 year period from 1985-1995, were subset from Scopus to form a dataset of 3,394,799publications and 51,801,106 references (8,397,935 unique). For each of these publications,all possible reference pairs were generated and then restricted to those pairs where bothmembers were in the set of highly cited publications (above).For example, the data for 1985 consisted of 223,485 articles after processing as describedabove. Computing all reference pairs (that were also members of the highly cited publicationset of 768,993) from these 223,485 articles gave rise to 2,600,101 reference pairs (Table 1)that ranged in co-citation frequency from 1 to 874 within the 1985 dataset; from 1 to 11,949across the 11 year period 1985-1995; and from 1 to 35,755 across all of Scopus. Collectively,the publications in our 1985-1995 dataset generated 33,641,395 unique co-citation pairs, forwhich we computed co-citation frequencies across all of Scopus.

Derivation of θ We now show how we deﬁne our prior on the probability of x and y beingco-cited, based on the citation graph restricted to publications that cite either x or y (butnot both) up to the year of their ﬁrst co-citation. Recall that we deﬁned a proxy co-citationof x and y to be an article that cites one member of the co-cited pair ( x, y ) and also citesat least one article that cites the other member. The idea behind this deﬁnition is that we4 copus Count Citations for all Publications in ScopusGroup by

Year of PublicationGenerate Hazen

PercentilesSelect top 1% from each year Published in 1985-1995All pubs of type ArticleWith at least 5 cited references Generate pairwise combinations of all references within an article for all selected articlesRestrict to pairwise combinations of references belonging to top 1% cited articlesCombine across 1985-1995 and compute co-citation frequencies publications 33.6 million co-cited pairs

Figure 1:

The workﬂow we used to generate a dataset of 33,641,395 co-citedpublications from references cited by articles in Scopus published in the years1985-1995.

Table 1:

Summary of Analyzed Data

Publication of type article that had at leastﬁve cited references indexed in Scopus were selected from the eleven years, 1985-1995.All possible reference pairs were generated for the cited references of these articles andthen restricted to those pairs where both members were in the set of 768,993 highly citedpublications. The column Co-cited Pairs shows the number of pairs in each year after therestriction was applied.Year Articles References Co-cited Pairs1985 223,485 1,796,502 2,600,1011986 238,096 1,920,225 2,840,5571987 250,575 2,037,654 3,180,2611988 269,219 2,182,571 3,406,9021989 285,873 2,303,481 3,793,9861990 305,010 2,490,909 4,546,9151991 325,782. 2,662,005 5,039,3341992 343,239. 2,846,607 5,622,1641993 360,916 3,006,374 6,121,1471994 387,062. 3,228,240 7,022,4991995 405,503. 3,432,228 7,626,6845onsider papers that cite x as proxies for x , and papers that cite y as proxies for y . Thus, ifa paper a cites both x and y (cid:48) (where y (cid:48) is a proxy for y ), then it is a proxy for a co-citationof x and y . Similarly, if a paper b cites both y and x (cid:48) (where x (cid:48) is a proxy for x ), it is also aproxy for a co-citation of x and y . This motivates the graph-theoretic formulation, whichwe now formally present.We ﬁx the pair x, y and we deﬁne N ( x ) to be the set of all publications that cite x (but donot also cite y ), and are published no later than the year of the ﬁrst co-citation of x and y .We similarly deﬁne N ( y ) . We deﬁne a directed bipartite graph with vertex set N ( x ) ∪ N ( y ) .Note that if x cites y then x ∈ N ( y ) , and similarly for the case where y cites x . Note alsothat since we have restricted N ( x ) and N ( y ) that N ( x ) ∩ N ( y ) = ∅ . We now describe howthe directed edge set E ( x, y ) is constructed. For any pair of articles a, b where a ∈ N ( x ) and b ∈ N ( y ) , if a cites b then we include the directed edge a → b in E ( x, y ) . Similarly, weinclude edge b → a if b cites a . Finally, if a pair of articles both cite each other, then thegraph has parallel edges. By construction, this graph is bipartite , which means that all theedges go between the two sets N ( x ) and N ( y ) (i.e., no edges exist between two vertices in N ( x ) , nor between two vertices in N ( y ) ).Note that by the deﬁnition, every edge in E ( x, y ) arises because of a proxy co-citation, sothat the number of proxy co-citations is the number of directed edges in E ( x, y ) . Considerthe situation where a publication a cites x (so that a ∈ N ( x ) ) and also cites b , b , b in N ( y ) : this deﬁnes three directed edges from a to nodes of N ( y ) . We count this as threeproxy co-citations, not as one proxy co-citation. Similarly, if we have a publication b thatcites y and also cites a , a , a , a in N ( x ) , then there are four directed edges that go from b to nodes in N ( x ) and we will count each of those directed edges as a diﬀerent proxyco-citation.Accordingly, letting | X | denote the cardinality of a set X , we note | E ( x, y ) | , i.e., the numberof directed edges that go between N ( x ) and N ( y ) , is the number of proxy co-citationsbetween x and y . If no parallel edges are permitted, the maximum number of possibleproxy co-citations is | N ( x ) | × | N ( y ) | . Under the assumption that both N ( x ) and N ( y ) eachhave at least one article, we deﬁne θ ( x, y ) , our prior on the probability of x and y beingco-cited, as follows: θ ( x, y ) = | E ( x, y ) || N ( x ) | × | N ( y ) | . Note that if parallel edges do not occur in the graph, then θ ( x, y ) ≤ , but that otherwisethe value can be greater than . Note also that θ ( x, y ) = 0 if E ( x, y ) = ∅ (i.e., if there areno proxy co-citations) and that θ ( x, y ) = 1 if every possible proxy co-citation occurs.To eﬃciently calculate θ , we used the following pipeline. We copied Scopus data froma relational schema in PostgreSQL into a citation graph from Scopus into the Neo4j 3.5graph database using an automated Extract Transform Load (ETL) pipeline that combined6ostgres CSV export and the Neo4j Bulk Import tool. The graph vertex set is all publica-tions, each with a publication year attribute, and the edge set is all citations between thepublications. A Cypher index was created on the publication year. We developed Cypherqueries to calculate θ and tuned performance by splitting input publication pairs into smallbatches and processing them in parallel, using parallelization in Bash and GNU Parallel.Batch size, the number of parallel job slots, and other parameters were tuned for perfor-mance, with best results achieved on batch sizes varying from 20 to 100 pairs. The resultsof θ calculations were cross-checked using SQL calculations. In the small number of caseswhere θ computed to > (above) it was set to 1 for the purpose of this study. Statistical Calculations

We denote the observed co-citation frequency data by the multi-set X o = { x o , . . . , x oN } , where N is the total number of pairs of articles and x oi is the observed frequency of the i th pair of papers being co-cited. Note that this is in general a multi-set, as diﬀerent pairs ofarticles can have the same co-citation frequency. Let n ( x ) be the number of times that x appears in X o (equivalent, n ( x ) is the number of pairs of articles that are co-cited x times),and let N ( x ) = (cid:80) ∞ y = x n ( y ) denote the total number of pairs of articles that are co-cited atleast x times. Then f o ( x | x ≥ x ) = n ( x ) N ( x ) for x ∈ [ x, ∞ ) , (1)where x is a parameter we use to analyze the distribution’s right tail starting at vary-ing frequencies. We describe in this subsection (i) the statistical computations for ﬁttinglognormal and power law distributions to right tails of the observed co-citation frequencydistributions as deﬁned by (1) for various x and (ii) how we assessed the quality of thoseﬁts. Further, we performed such analyses for various slices of the data, stratifying by θ andother parameters, as is described in the Results section.We used a discrete version of a lognormal distribution to represent integer co-citation fre-quencies, f ( · ) , following [55] and [56], while appropriately normalizing for our conditionalassessment of the right tail commencing at x : f LN ( x | µ, σ, x ) = ˜ f ( x | µ, σ ) (cid:80) ∞ n = x ˜ f ( n | µ, σ ) for x ≥ x (2) ˜ f ( x | µ, σ ) = (cid:90) x +0 . x − . dqq √ πσ exp (cid:32) − (ln q − µ ) σ (cid:33) , where µ and σ are the mean and standard deviation, respectively, of the underlying normaldistribution. These probabilities can be computed with the cumulative normal distribu-tion, ˜ f ( x | µ, σ ) = Φ (cid:18) ln ( x + 0 . σ (cid:19) − Φ (cid:18) ln ( x − . σ (cid:19) , x , using a maximum (log) likelihood estimator (MLE). We solved forthe best-ﬁt distributional parameters for the lognormal distribution, µ and σ , by modifying amulti-dimensional interval search algorithm from [43] and following [56]. A compiled versionof this code using the C++ header ﬁle, amoeba.h , is available on our Github site [30].We ﬁt a discrete power law distribution to the data for various values of x , which wasnormalized for our conditional observations of the right tail: f P L ( x | α, x ) = x − α ζ ( α, x ) for x ≥ x, (3)where the Hurwitz zeta function, ζ ( α, x ) = ∞ (cid:88) x =0 x + x ) α , is a generalization of the Riemann zeta function, ζ ( α, , as is needed for analysis of theright tail.We solved ﬁrst-order conditions for the (log) MLE to ﬁnd the best-ﬁt distributional expo-nent α , ζ (cid:48) ( α, x ) ζ ( α, x ) = − N ( x ) (cid:88) x ∈ X o ( x ) ln x, (4)as described in [13] and [25], where X o ( x ) = { x ∈ X o : x ≥ x } , are the observed co-citations with frequencies at least as great as x and N ( x ) is the number of such co-citations.We solved (4) to ﬁnd α using a bisection algorithm.We used the χ goodness of ﬁt ( χ ) and the Kolmogorov-Smirnov (K-S) tests to assess thenull hypothesis that the distribution of the observed co-citation frequencies and the best-ﬁtlognormal distribution are the same, and similarly for the best-ﬁt power law distribution.We also computed the Kullback-Leibler Divergence (K-L) between the observed data andthe best-ﬁt distributions.Both the χ and K-S tests employed the null hypothesis that the observed co-citationfrequencies, n ( x ) for x ∈ [ x, ∞ ) , were sampled from the best-ﬁt lognormal or power lawdistributions, which we denote by f d ( · | x ) for d ∈ { LN, P L } , while suppressing the param-eters speciﬁc to each of the distributions.The usual χ statistic was computed by, ﬁrst, grouping each of the observed co-citationfrequencies into k bins, denoted by b i for i ∈ { , . . . , k } , and then computing χ = k (cid:88) i =1 ( O i − E i ) E i , O i is the observed number of co-citations having frequencies associated with the i -thbin, O i = (cid:88) x ∈ b i n ( x ) , and E i is the expected number of observations for frequencies in bin i , if the null hypothesiswas true, in a sample with size equal to the number of observed data points, N ( x ) : E i = (cid:88) x ∈ b i f d ( x | x ) N ( x ) If the null hypothesis was true, then we would expect O i and E i to be approximately equal,with deviations owing to variability due to sampling.Constructing the bins b i requires only that E i ≥ for every i = 1 , . . . , k . Test outcomes aresometimes sensitive to the minimum E i permitted, which we will denote by E , and so wetested with multiple thresholds, including 10, 20, 50, and 70. Furthermore, statistical testsare stochastic: these multiple tests permitted a reduction in the probability of erroneouslyrejecting or accepting the null hypothesis based on a single test. The distribution of observedco-citation frequencies was skewed right with a long tail, so that aggregating bins to satisfy E i ≥ E was most critical in the right tail. This motivated a bin construction algorithm thataggregated frequencies in reverse order, starting with the extreme right tail. Algorithm 1requires a set of the unique observed co-citation frequencies, ˆ X o , which includes the elementsof the multiset X o without repetition. While Algorithm 1 does not guarantee in generalthat all bins satisfy E i ≥ E , that criterion was satisﬁed for the observed data. Algorithm 1

Frequency Bin Construction i ← b = {} while (cid:12)(cid:12)(cid:12) ˆ X o (cid:12)(cid:12)(cid:12) > do b i ← b i ∪ (cid:110) max (cid:16) ˆ X o (cid:17)(cid:111) ˆ X o ← ˆ X o \ max (cid:16) ˆ X o (cid:17) if E i ≥ E then i ← i + 1 b i ← {} end if end while We implemented a K-S test using simulation to generate a sampling distribution to accountfor the discrete frequency observations [54]. We denote the cumulative distribution ofobserved co-citation frequencies by F o ( x | x ) = (cid:80) xi = x f o ( i | x ) , and the best-ﬁt cumulative9istribution by F d ( x | x ) = (cid:80) xi = x f d ( i | x ) . The K-S test involves testing the maximumabsolute diﬀerence between the observed and theorized cumulative distributions, D n = max x | F o ( x | x ) − F d ( x | x ) | , where n is the number of observations giving rise to F o ( x | x ) , against the distribution ofsuch diﬀerences between samples from the theorized distribution with the same number ofobservations, n , ˜ D n = max x (cid:12)(cid:12)(cid:12) ˜ F d, ( x | x ) − ˜ F d, ( x | x ) (cid:12)(cid:12)(cid:12) , where ˜ F d,j ( x | x ) is the empirical distribution of sample j of size n (notation suppressed)drawn from F d ( x | x ) . We generated 100 such random variables ˜ D n for each test. We rejectthe null hypothesis if D n is larger than substantially all of the ˜ D n , say all but 5%, forequivalence with a p -value of 0.05. The number of ˜ D n samples drawn yields a p -value witha resolution of 1%.We computed the K-L Divergence two ways due to its asymmetry: D K − L ( f o (cid:107) f d ) = ∞ (cid:88) x = x f o ( x | x ) ln f o ( x | x ) f d ( x | x ) D K − L ( f d (cid:107) f o ) = ∞ (cid:88) x = x f d ( x | x ) ln f d ( x | x ) f o ( x | x ) . Separate from the tests above, we tested whether the distribution of co-citation frequencieswas independent of θ using a χ test, using the null hypothesis that the co-citation frequencydistribution was independent of θ . We initially created a contingency table on θ and co-citation frequency using these bins for θ , { [0 . , . , [0 . , . , [0 . , . , [0 . , . , [0 . , . } ,and logarithmic bins for frequency to accommodate the skewed distributions: { [10 , , [100 , , [1000 , , [10000 , } . We, subsequently, aggregated these bins to have an expected number of co-citations in eachbin equal to or greater than 5 to account for a decreasing number of observations as θ and fre-quency increased by having just two intervals for frequency: { [10 , , [100 , } . Kinetics of Co-citation

We extended prior work on delayed recognition and the SleepingBeauty phenomemon [27, 59, 33, 23] towards co-citation. We have modiﬁed the beauty coef-ﬁcient (B) of [27] to address co-citations by: (i) counting citations to a pair of publications(co-citations) rather than citations to individual papers, (ii) setting t (age zero) to theﬁrst year in which a pair of publications could be co-cited (i.e., the publication year of themore recently published member of a co-cited pair), and (iii) setting C to the number ofco-citations occurring in year t . Rather than calculate awakening time as in [27], we opted10o measure the simpler length of time between t and the ﬁrst year in which a co-citationwas recorded; we label this measurement as the timelag t l , so that t l = 0 if a co-citationwas recorded in t . Results and Discussion

Our base dataset, described in Table 1, consists of the 33,641,395 co-cited reference pairs(33.6 million pairs) and their co-citation frequencies, gathered from Scopus during the11-year period from 1985-1995 (Materials and Methods). A striking distribution of co-citation frequencies with a long right tail is observed with a minimum co-citation of 1, amedian of 2, and a maximum co-citation frequency of 51,567 (Figure 2). Approximately33.3 of 33.6 million pairs (99% of observations) have co-citation frequencies ranging from1–67 and the remaining 1% have co-citation frequencies ranging from 68–51,567. Since thefocus of our study was co-citations of frequently cited publications, we further restrictedthis dataset to those pairs with a co-citation frequency of at least 10, which resulted ina smaller dataset of 4,119,324 co-cited pairs (4.1 million pairs) with minimum co-citationfrequency of 10, median of 18, and a maximum co-citation frequency of 51,567. In order tofocus on co-citations derived from highly cited publications, θ was calculated for all pairswith a co-citation frequency of at least 10. We also note whether one article in a co-citationpair cites the other (connectedness).Inﬂuenced by the use of linked co-citations for clustering [52], we also examined the extentto which members of a co-cited pair were also found in other co-cited pairs. We found that205,543 articles contributed to 4.12 million co-cited pairs. The highest frequency observedin our dataset, 51,567 co-citations, was for a pair of articles from the ﬁeld of physicalchemistry: Becke (1993) [3] and Lee, Yang, and Parr (1988) [32]. The members of this pairare not connected and are found in a total of 1,504 co-cited pairs with frequencies rangingfrom 10 to 51,567. The second highest frequency, 28,407 co-citations, was for another pairof articles from the ﬁeld of biochemistry: [31, 9]. Members of this pair are not connectedand are found in 41,909 co-cited pairs, 24,558 for the Laemmli gel electrophoresis articleand 17,352 for the Bradford protein estimation article. In terms of this second pair, botharticles describe methods heavily used in biochemistry and molecular biology, an area withstrong referencing activity, so this result is not entirely surprising.Having developed θ ( x, y ) as a prediction of the probability that articles x and y would beco-cited, we ﬁrst tested whether the distribution of co-citation frequencies was indepen-dent of θ (Materials and Methods). The null hypothesis that the co-citation frequencydistribution was independent of θ was rejected with a very small p -value: the statisticalsoftware indicated a p -value with no signiﬁcant non-zero digits. We next investigated whatdistribution functions might ﬁt the frequencies of co-citation as θ varied.11ased on the long tails of citation frequencies, prior research has assessed the ﬁt of lognormaland power law distributions [55, 47, 56]. We noted long right tails in co-citation frequencies,which, similarly, motivated us to assess the ﬁt of lognormal and power law distributions toco-citation data. Further, we stratiﬁed the data according to (i) the minimum frequencyfor the right tail x , (ii) θ , and (iii) whether the two members of each co-citation pairwere connected. Figure 3 shows which distribution, if either, ﬁts the data in each slice,based on tests of statistical signiﬁcance. Note that there were no circumstances where bothdistributions ﬁt: if one ﬁt, then the other did not.Statistical tests were not possible for some slices due to an insuﬃcient number of datapoints. This was the case for certain combinations of large x , large θ , and co-citations thatwere not connected. The number of data points obviously decreases as x increases, andwe found the decrease in the number of data points to be more precipitous when θ waslarge and co-citations were unconnected due to the lighter right tails for these parametercombinations. The graph in the right panel of Figure 4, which has a logarithmic y -axis,shows that the number of data points per θ interval analyzed decreases most often by morethan an order of magnitude from one interval to the next as θ increases. Most pairs ofpublications that are co-cited at least ten times, therefore, have small values of θ .Figure 3 indicates when the null hypothesis of a best-ﬁt lognormal or power law ﬁttingthe observed data can not be rejected. We computed two types of statistics for evaluatingthe null hypothesis ( χ and K-S) and, moreover, we computed the χ statistic for fourbinning strategies. Figure 3 indicates a distributional ﬁt, speciﬁcally, if either the K-S p -value is greater than 0.05 or if two or more of the χ statistics are greater than 0.05.While we computed the K-L Divergence (see supplementary material), we did not use thesecomputations for formal statements of distributional ﬁt because they are neither a norm nordo they determine statistical signiﬁcance. These K-L computations did, however, supportthe ﬁndings based on formal tests of statistical signiﬁcance.Power law distributions ﬁt most often when co-citations are connected (Fig. 3), when moreextreme right tails are considered, and when co-citations have small values of θ . Lognormaldistributions ﬁt, conversely, in some circumstances, when a greater portion of the right tailis considered. These observations support the existence of heavy tails for θ small, even if alognormal distribution ﬁts the observed data more broadly. This observation is consistentwith our observations of the most frequent co-citations having small θ values, as shown inthe scatter plot in the left panel of Figure 4.Mitzenmacher [38] shows a close relationship between the power law and lognormal dis-tributions vis-à-vis subtle variations in generative mechanisms that determine whether theresulting distribution is power law or lognormal. The stratiﬁed layers in Figure 3 where alognormal distribution ﬁts for some portion of the right tail and, in the same instance, apower law describes the more extreme tail, may, therefore, be due to a generative mecha-nism whose parameters are close to those for a power law distribution as well as those for12 lognormal distribution.Table 2: Exponents of best-ﬁt power law distributions

These observations are forpower law exponents where comparison across intervals of θ were possible, and where sta-tistical tests indicated that a power law was a good ﬁt to the data. The articles of theco-citations were connected for all data shown.Right-tail cutoﬀ ( x ) θ Power law exponent ( α )200 [0 . , . [0 . , . [0 . , . [0 . , . [0 . , . [0 . , . θ : these were possible for θ intervals of [0 . , . and [0 . , . , for connectedco-citations, and right tails commencing at x ∈ { , , } . The power law exponent α in these comparisons was less for θ ∈ [0 . , . than for θ ∈ [0 . , . , indicating heavier tailsfor θ small and, therefore, a greater chance of extreme co-citation frequency. Figure 5 showsa log-log plot of the number of co-citations ( y -axis) exhibiting the counts on the x -axis, for θ in the interval [0 . , . (note that both axes employ log scaling). The pattern for pointsbelow the 99th percentile clearly indicate that the number of co-citations referenced at agiven frequency decreases greatly as the frequency increases. Also, the broadening of thescatter where fewer co-citations are cited more frequently is indicative of a long right tail,as has been observed in other research where lognormal or power law distributions havebeen ﬁt to data, as in [39].Perline [42] warns against ﬁtting a power law function to truncated data. Informally, aportion of the entire data set can appear linear on a log-log plot, while the entire data setwould not. He cites instances where researchers have mistakenly characterized an entiredata set as following a power law due to an analysis of only a portion of the data, when alognormal distribution might provide a better ﬁt to the entire data set. Indeed, the scatterplot in Figure 5 is not linear and so, as Figure 3 shows, a power law does not ﬁt the entiredata set. This is what Perline calls a weak power law where a power law distributionfunction ﬁts the tail, but not the entire distribution. Our concern, however, is not withcharacterizing the distributional function for the entire data set, but with characterizing thefeatures of high frequency co-citations, which by deﬁnition means we were concerned withthe right tail of the distribution. Moreover, the results avoid confusion between lognormaland power law distribution functions because we have shown not only that a power lawprovides a statistically signiﬁcant ﬁt, but also that a lognormal distribution function does13ot ﬁt.Our analysis found particularly heavy tails that were well ﬁt by power law distributionsfor small θ , in the intervals [0 . , . and [0 . , . , and for co-citations whose constituentsare connected, as shown in Fig. 3. The closely related Matthew Eﬀect [36], cumulativeadvantage [45], and the preferential attachment class of models [1] provide a possible ex-planation for citation frequencies following a power law distribution for some suﬃcientlyextreme portion of the right tail. For greater values of θ , insuﬃcient data in the right tailsprecludes a deﬁnitive assessment in this regard, although one might argue that the lackof observations in the tails is counter to the existence of a power law relationship. It isalso noteworthy that the exponents we found for co-citations (Table 2) are close in valueto those reported for citations by [45] and [47]. Delayed Co-citations

The delayed onset of citations to a well cited publication, also referredto as ‘Delayed Recognition’ and ’Sleeping Beauty’, has been studied by Garﬁeld, van Raan,and others [21, 60, 27, 59, 33, 23, 5]. We sought to extend this concept to frequently co-cited articles. As an initial step, we calculated two parameters (Materials and Methods):(1) the beauty coeﬃcient [27] modiﬁed for co-cited articles and (2) timelag t l , the lengthof time between ﬁrst possible year of co-citation and the ﬁrst year in which a co-citationwas recorded. We further focused our consideration of delayed co-citations to the 95thpercentile or greater of co-citation frequencies in our dataset of 4.1 million co-cited pairs.Within the bounds of this restriction, 24 co-cited pairs have a beauty coeﬃcient of 1,000or greater and all 24 are in the 99th percentile of co-citation frequencies. Thus, very highbeauty coeﬃcients are associated with high co-citation frequencies.We also examined the relationship of t l with co-citation frequencies (Fig. 6) and observedthat high t l values were associated with lower co-citation frequencies. These data in ap-pear to be consistent with a report from van Raan and Winnink [60], who conclude that‘probability of awakening after a period of deep sleep is becoming rapidly smaller for longersleeping periods’. Further, when two articles are connected, they tend to have smaller t l values compared to pairs that are not connected in the same frequency range.14 igures f r equen cy f r equen cy pe r c en t c onne c t ed Figure 2: The x-axis shows percentiles for all three plots

Left Side

Co-citation frequen-cies of highly cited publications from Scopus 1985-1995

Co-citation frequencies are plot-ted against their percentile values. The upper and lower plots were both generated from33,641,395 data points. The lower plot shows the same data with a logarithmic (ln) trans-formation of y-axis. The minimum co-citation frequency is 1, the median is 2, the thirdquartile is 4, and the maximum is 51,567. Additionally, 15,140,356 pairs (45 %) have aco-citation frequency of 1. Frequencies of 12, 22, 67, and 209 correspond to quantile valuesof 0.9, 0.95, 0.99, and 0.999 respectively.

Right Side

Direct citations between membersof a co-cited pair (connectedness) increase with co-citation frequency. The proportion ofconnected pairs (a direct citation exists between the two members of a pair) within eachpercentile is shown. Data are plotted for all pairs with a co-citation frequency of at least (4.1 million pairs) 15igure 3: Distributional ﬁts to the observed co-citation frequencies

The graphshows where a lognormal or power law distribution demonstrated a statistically signiﬁcantﬁt with the observed co-citation frequencies stratiﬁed by θ , extent of the right tail tested x ,and whether co-citations were connected. A power law ﬁt more often for θ in the intervals [0 . , . and [0 . , . when cocitation constituents were connected. When a lognormaldistribution ﬁt, it was for broader portions of the data set. Data were insuﬃcient fortesting as θ increased due to (i) fewer observations and (ii) less prominent right tails.16a) Co-citation Scopus frequency versus θ (b) Number of co-cited pairs per θ intervalFigure 4: Co-citation dynamics relative to θ . (a) Points represent the Scopus frequencyvs. θ value for each co-cited pair. Darker regions indicate denser plots of the translucentpoints. Co-cited pairs with the greater frequency are observed for pairs with smaller θ . (b)The y -axis employs a log scale and shows the number of co-cited pairs per θ interval. Thenumber of co-cited pairs decreases, most often, by more than an order of magnitude perinterval as θ increases. The dominance of co-cited pairs with smaller θ are also reﬂected byregions of greater density in panel (a). 17igure 5: Log-log plot of the number of co-citations versus co-citation countfor θ ∈ [0 . , . The y -axis shows the number of co-cited pairs observed having the cita-tion counts plotted along the x -axis. The tightly clustered plot below the 99 th percentiledemonstrates a clear pattern of decreasing number of co-cited pairs having an increasingnumber of citation counts. The scatter plot for the tail above the 99 th percentile broadens,indicating a long tail of relatively few co-cited pairs that were cited with extreme frequency.18igure 6: Relationship between time lag ( t l ) and co-citation frequency Extendedlag times are associated with lower co-citation frequencies. Connected pairs have lower t l values. Data are shown for 207,214 pairs consisting of ≥ t l , the time between ﬁrst possible co-citation and ﬁrst co-citation.19 F r equen cy example_1 pub1pub2cocite050100150200250 1960 1980 2000 2020 Publication Year F r equen cy example_2 pub1pub2cocite Figure 7:

Co-citation frequencies of highly cited publications from Scopus1985-1995

Upper panel

Publication 1: Instability of the interface of two gases acceleratedby a shock wave (1972) doi: 10.1007/BF01015969, ﬁrst cited (1993), total citations (566).Publication 2: Taylor instability in shock acceleration of compressible ﬂuids (1960) doi:10.1002/cpa.3160130207, ﬁrst cited (1973), total citations (566), ﬁrst co-cited (1993), totalco-citations (541).

Lower Panel

Publication 1: Colorimetric assay of catalase doi: 10.1016/0003-2697(72)90132-7 (1972) doi: 10.1016/0304-4165(79)90289-7, ﬁrst cited (1972), totalcitations (2683). Publication 2: Levels of glutathione, glutathione reductase andglutathione S-transferase activities in rat lung and liver (1979) doi: 10.1016/0304-4165(79)90289-7, ﬁrst cited (1979), total citations (2464), ﬁrst co-cited (1979), totalco-citations (470).. 20 onclusions

In this article, we report on our exploration of features that impact the frequency of co-citations. In particular, we wished to examine article pairs with high co-citation frequencieswith respect to whether they originated from the same school(s) of thought or representednovel combinations of existing ideas. However, deﬁning a discipline is challenging, anddetermining the discipline(s) relevant to speciﬁc publications remains a challenging problem.Journal-level classiﬁcations of disciplines have known limitations and while article-levelapproaches oﬀer some advantages, they are not free of their own limitations [37].Consequently, we designed θ , a statistic that examines the citation neighborhood of a pairof articles x and y to estimate the probability that they would be co-cited. Our approachhas advantages compared to alternate approaches: it avoids the challenges of journal-levelanalyses, it does not require a deﬁnition of “discipline" (or “disciplinary distance"), it doesnot require assignment of disciplines to articles, it is computationally feasible, and, mostimportantly, it enables an evaluation that is speciﬁc to a given pair of articles.We note that when x and y are from the same sub-ﬁeld, then θ may be very large, andconversely, when x and y are from very diﬀerent ﬁelds, it might be reasonable to expectthat θ will be small. Thus, in a sense, θ may correlate with disciplinary similarity, withlarge values for θ reﬂecting conditions where the two publications are in the same (orvery close) sub-disciplines, and small values for θ reﬂecting that the disciplines for the twopublications are very distantly related. We also comment that in this initial study, we havenot considered second-degree information, that is publications that cite publications thatcite an article of interest.Our data indicate that the most frequent co-citations occur when co-citations have smallvalues of θ , as shown in Figure 4. Our study considered the hypothesis that the frequencydistribution is independent of θ , but our statistical tests rejected this hypothesis, andshowed instead that the frequency distribution is best characterized by a power law for smallvalues of θ and connected publications, and in many other regions is best characterized bya lognormal distribution.The observation that power laws are consistent with small values of θ and connected co-citations is consistent with the theory of preferential attachment for these parameter set-tings. To the extent that preferential attachment is the mechanism giving rise to a powerlaw, this suggests that preferential attachment is, at least, stronger for small θ values andconnected co-citations than for other parameter combinations, or that preferential attach-ment is not applicable to other parameter values.Observing power laws, heavy tails, and pairs with extreme co-citation strength for smallvalues of θ (i.e., pairs that have small a priori probabilities of being co-cited) may seem, onits face, paradoxical. One possible explanation of the pairs in the extreme right tail with21oth small θ and large co-citation strength is that those pairs represent novel combina-tions of ideas that, when recognized within the research community, catalyze an increasedcitation rate, consistent with preferential attachment coupled to time-dependent initial at-tractiveness [20] as an underlying generative mechanism. However, small values of θ do notguarantee a high co-citation count: indeed, even for small values of θ , co-citations with apower law predominantly have relatively low co-citation strength.We also note the increasing proportion of connected pairs as the percentile for co-citationfrequency increases (Fig. 2); this pair of parameters appears to be associated with a fertileenvironment where extremely high co-citation frequencies are possible. This observationraises the question of whether small values of θ and connected co-citations are associ-ated with preferential attachment and, if a causal relationship exists, then how do θ andco-citation connection provide an environment supporting preferential attachment? A pos-sibility is that one article in a co-cited pair citing the other makes the potential signiﬁcanceof the combination of their ideas apparent to researchers. The clear pattern of the highestfrequency co-cited pairs typically having low θ values suggests that these pairs are highlycited and hence impactful because of the novelty in the ideas or ﬁelds that are combined (asreﬂected in low θ ). However, other factors should be considered, such as the prominence ofauthors and prestige of a journal [22] where the ﬁrst co-citation appears.We did not apply ﬁeld-normalization techniques when assembling the parent pool of 768,993highly cited articles consisting of the top 1% of highly cited articles from each year in theScopus bibliography. Thus, the highly co-cited pairs we observe are biased towards high-referencing areas such as biomedicine and parts of the physical sciences [51]. However,the dataset we analyzed has a lower bound of 10 on co-citation frequencies and includespairs from ﬁelds other than those that are high referencing. For example, the maximum t l we observed in the dataset of 4.1 million pairs was 149 years, and is associated to a pairof articles independently published in 1840, establishing their eponymous Staudt-Clausentheorem [12, 61]; this pair of articles was apparently co-cited 10 times since their publication.A second pair of articles concerning electron theory of metals [17, 18] was ﬁrst co-cited in1994 for a total of 109 times, with t l observed of 94 years. Both cases are drawn frommathematics and physics rather than the medical literature. They are also consistentwith the suggestion that the probability of awakening is smaller after a period of deepsleep [60]. As we have deﬁned t l , with its heavy penalty for early citation, we createadditional sensitivity to coverage and data quality especially for pairs with low citationnumbers. Indeed, for the Staudt-Clausen pair, a manual search of other sources revealedan article [11] in which they are co-cited. Both these articles were originally published inGerman and it is possible that additional co-citations were not captured. Thus, big dataapproaches that serve to identify trends should be accompanied by more meticulous casestudies, where possible. Other approaches for examining depth of sleep and awakeningtime should certainly be considered [59, 27]. Lastly, using our approach to revisit invisiblecolleges [46, 16, 52] seems warranted, since it seems likely that the upper bound of a hundred22embers predicted by [46] is likely to have increased in a global scientiﬁc enterprise withelectronic publishing and social media.Finally, we view these results as a ﬁrst step towards further investigation of co-citationbehavior, and we introduce a new technique based on exploring ﬁrst-degree neighbors ofco-cited publications; we are hopeful that this graph-theoretic study will stimulate newapproaches that will provide additional insights, and prove complementary to other articlelevel approaches. Acknowledgments

In addition to support through federal funding, the ERNIE project features a collaborationwith Elsevier. We thank our colleagues from Elsevier for their support of the collabora-tion.

Competing Interests

The authors have no competing interests. Scopus data used in this study was available to usthrough a collaborative agreement with Elsevier on the ERNIE project. Elsevier personnelplayed no role in conceptualization, experimental design, review of results, or conclusionspresented. The content of this publication is solely the responsibility of the authors and doesnot necessarily represent the oﬃcial views of the National Institutes of Health or Elsevier.Sitaram Devarakonda’s present aﬃliation is Randstad USA. His contributions to this articlewere made while he was a full-time employee of NET ESolutions Corporation.

Author Contributions

Conceptualization, GC, JB, SD, and TW; Methodology, AD, DK, GC, JB, SD, SL, andTW; Investigation, DL-H, GC, JB, and SD; Writing -Original Draft, GC, JB, TW; Writing-Review and Editing, AD, DK, DL-H, GC, JB, SD, SL, and TW; Funding Acquisition, GC;Resources, DK and GC; Supervision, GC. Authors are listed in alphabetic order.

References [1]

Albert, R., and Barabási . Statistical mechanics of complex networks.

Reviews ofModern Physics 74 , 1 (2002), 47–97. 232]

Barber, B.

Resistance by scientists to scientiﬁc discovery.

Science 134 (1961), 596–602.[3]

Becke, A. D.

Density-functional thermochemistry. III. The role of exact exchange.

The Journal of Chemical Physics 98 , 7 (Apr. 1993), 5648–5652. Publisher: AmericanInstitute of Physics.[4]

Bornmann, L., Leydesdorff, L., and Mutz, R.

The use of percentiles andpercentile rank classes in the analysis of bibliometric data: Opportunities and limits.

Journal of Informetrics 7 , 1 (2013), 158–165.[5]

Bornmann, L., Ye, A. Y., and Ye, F. Y.

Identifying “hot papers” and papers with“delayed recognition” in large-scale datasets by using dynamically normalized citationimpact scores.

Scientometrics 116 , 2 (Aug. 2018), 655–674.[6]

Boyack, K., and Klavans, R.

Co-citation analysis, bibliographic coupling, anddirect citation: Which citation approach represents the research front most accurately?

Journal of the American Society for Information Science and Technology 61 , 12 (2010),2389–2404.[7]

Boyack, K., and Klavans, R.

Atypical combinations are confounded by disciplinaryeﬀects. In

International Conference on Science and Technology Indicators (Leiden,Netherlands, 2014), CWTS-Leiden University, pp. 49–58.[8]

Braam, R. R., Moed, H. F., and Raan, A. F. J. v.

Mapping of science bycombined co-citation and word analysis. I. Structural aspects.

Journal of the AmericanSociety for Information Science 42 , 4 (1991), 233–251.[9]

Bradford, M. M.

A rapid and sensitive method for the quantitation of microgramquantities of protein utilizing the principle of protein-dye binding.

Analytical Biochem-istry 72 (May 1976), 248–254.[10]

Bradley, J., Devarakonda, S., Davey, A., Korobskiy, D., Liu, S., Lakhdar-Hamina, D., Warnow, T., and Chacko, G.

Co-citations in context: Disciplinaryheterogeneity is relevant.

Quantitative Science Studies (2020), 1–13.[11]

Carlitz, L.

The Staudt-Clausen Theorem.

Mathematics Magazine 34 (1961), 131–146.[12]

Clausen, T.

Theorem.

Astronomische Nachrichten 17 (1840), 351–352.[13]

Clauset, A., Shalizi, C. R., and Newman, M. E. J.

Power-Law Distributions inEmpirical Data.

SIAM Review 51 , 4 (Nov. 2009), 661–703.[14]

Colavizza, G., Boyack, K., Eck, N. J. v., and Waltman, L.

The Closer theBetter: Similarity of Publication Pairs at Diﬀerent Cocitation Levels.

Journal of theAssociation for Information Science and Technology 69 , 4 (2018), 600–609.2415]

Cole, S.

Professional Standing and the Reception of Scientiﬁc Discoveries.

AmericanJournal of Sociology 76 , 2 (1970), 286–306.[16]

Crane, D.

Invisible Colleges: Diﬀusion of Knowledge in Scientiﬁc Communities .University of Chicago Press, Chicago, 1972.[17]

Drude, P.

Zur Elektronentheorie der Metalle.

Annalen der Physik 306 (1900), 566–613.[18]

Drude, P.

Zur Elektronentheorie der Metalle; II. Teil. Galvanomagnetische undthermomagnetische Eﬀecte.

Annalen der Physik 308 (1900), 369–402.[19]

Elsevier BV . Scopus, 2019. accessed Dec 2019.[20]

Eom, Y.-H., and Fortunato, S.

Characterizing and modeling citation dynamics.

PLOS ONE 6 , 9 (09 2011), 1–7.[21]

Garfield, E.

Would Mendel’s work have been ignored if the Science Citation Indexwas available 100 years ago?

Essays of an Information Scientist 1 (1970), 69–70.[22]

Garfield, E.

Premature Discovery or Delayed Recognition - Why?

Essays of anInformation Scientist 4 (1980), 488–493.[23]

Glänzel, W., and Garfield, E.

The myth of delayed recognition.

Scientist 18 , 11(June 2004).[24]

Gmür, M.

Co-citation analysis and the search for invisible colleges: A methodologicalevaluation.

Scientometrics 57 , 1 (May 2003), 27–57.[25]

Goldstein, M. L., Morris, S. A., and Yen, G. G.

Problems with ﬁtting to thepower-law distribution.

The European Physical Journal B - Condensed Matter andComplex Systems 41 , 2 (Sept. 2004), 255–258.[26]

Gómez, I., Bordons, M., Fernàndez, M., and Méndez, A.

Coping with theproblem of subject classiﬁcation diversity.

Scientometrics 35 (02 1996), 223–235.[27]

Ke, Q., Ferrara, E., Radicchi, F., and Flammini, A.

Deﬁning and identifyingSleeping Beauties in science.

Proceedings of the National Academy of Sciences 112 , 24(June 2015), 7426–7431.[28]

Kessler, M. M.

Bibliographic coupling between scientiﬁc pa-pers.

American Documentation 14 , 1 (1963), 10–25. _eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.5090140103.[29]

Klavans, R., and Boyack, K.

Which type of citation analysis generates the mostaccurate taxonomy of scientiﬁc and technical knowledge?

Journal of the Associationfor Information Science and Technology 68 (04 2017), 984–998.2530]

Korobskiy, D., Davey, A., Liu, S., Devarakonda, S., and Chacko, G.

En-hanced Research Network Informatics Environment (ERNIE). Github repository, NETESolutions Corporation, 2019.[31]

Laemmli, U. K.

Cleavage of structural proteins during the assembly of the head ofbacteriophage T4.

Nature 227 , 5259 (Aug. 1970), 680–685.[32]

Lee, C., Yang, W., and Parr, R. G.

Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density.

Physical Review B 37 , 2 (Jan.1988), 785–789. Publisher: American Physical Society.[33]

Li, J., and Ye, F. Y.

Distinguishing sleeping beauties in science.

Scientometrics108 , 2 (Aug. 2016), 821–828.[34]

Marshakova-Shaikevich, I.

System of Document Connections Based on References.

Scientiﬁc and Technical Information Serial of VINITI 6 , 2 (1973), 3–8.[35]

Merton, R.

Resistance to the systematic study of multiple discoveries in science.

European Journal of Sociology 4 , 2 (1963), 37–282.[36]

Merton, R.

The Matthew Eﬀect in science.

Science 159 , 3810 (1968), 56–63.[37]

Milojevic, S.

Practical method to reclassify web of science articles into uniquesubject categories and broad disciplines.

Quantitative Science Studies (12 2019), 1–24.[38]

Mitzenmacher, M.

A brief history of generative models for power law and lognormaldistributions.

Internet Mathematics 1 (2003), 226–251.[39]

Montebruno, P., Bennett, R. J., van Lieshout, C., and Smith, H.

A taleof two tails: Do Power Law and Lognormal models ﬁt ﬁrm-size distributions in themid-Victorian era?

Physica A: Statistical Mechanics and its Applications 523 (June2019), 858–875.[40]

Newman, M. E. J.

The Structure and Function of Complex Networks.

SIAM Review45 , 2 (Jan. 2003), 167–256.[41]

Noma, E.

Co-citation analysis and the invisible college.

Journal of the AmericanSociety for Information Science 35 , 1 (1984), 29–33.[42]

Perline, R.

Strong, weak and false inverse power laws.

Statistical Sciecne 20 , 1(2005), 68–88.[43]

Press, W., Teukolsky, S., Vetterling, W., and Flannery, B.

Numericalrecipes in C: The art of scientiﬁc computing (3rd ed.) . Cambridge University Press,New York, September 2007.[44]

Price, D. d. S.

Networks of Scientiﬁc Papers.

Science 149 (1965), 510–515.2645]

Price, D. d. S.

A general theory of bibliometric and other cumulative advantageprocesses.

Journal of the American Society for Information Science 27 , 5 (Sept. 1976),292–306.[46]

Price, D. d. S., and Beaver, D. D.

Collaboration in an invisible college.

AmericanPsychologist 21 , 11 (1966), 1011–1018.[47]

Radicchi, F., Fortunato, S., and Castellano, C.

Universality of citation dis-tributions: toward an objective measure of scientiﬁc impact.

PNAS 105 , 45 (2008),17268–17272.[48]

Redner, S.

Citation statistics from 110 years of physical review.

Physics Today 58 ,6 (2005), 49–54.[49]

Shu, F., Julien, C.-A., Zhang, L., Qiu, J., Zhang, J., and Larivière, V.

Comparing journal and paper level classiﬁcations of science.

Journal of Informetrics13 , 1 (Feb. 2019), 202–225.[50]

Small, H.

Co-citation in the scientiﬁc literature: A new measure of the relationshipbetween two documents.

Journal of the American Society for Information Science 24 ,4 (1973), 265–269.[51]

Small, H., and Greenlee, E.

Citation context analysis of a co-citation cluster:Recombinant-DNA.

Scientometrics 2 , 4 (July 1980), 277–301.[52]

Small, H., and Sweeney, E.

Clustering the science citation index R (cid:13) using co-citations. Scientometrics 7 , 3 (Mar. 1985), 391–409.[53]

Small, H., Sweeney, E., and Greenlee, E.

Clustering the science citation indexusing co-citations. II. Mapping science.

Scientometrics 8 , 5 (Nov. 1985), 321–340.[54]

StackExchange . Can I use the Kolmogorov–Smirnov test on my Data?, 2014.https://stats.stackexchange.com/questions/112910/can-i-use-kolmogorov-smirnov-test-on-my-data.[55]

Stringer, M. J., Sales-Pardo, M., and Amaral, L. A. N.

Eﬀectiveness ofjournal ranking schemes as a tool for locating information.

Journal of the AmericanSociety for Information Science and Technology 3 , 2 (2008), e1683.[56]

Stringer, M. J., Sales-Pardo, M., and Amaral, L. A. N.

Statistical validationof a global model for the distribution of the ultimate number of citations accrued bypapers published in a scientiﬁc journal.

PLoS ONE 61 , 7 (2010), 1377–1385.[57]

Uzzi, B., Mukherjee, S., Stringer, M., and Jones, B.

Atypical Combinationsand Scientiﬁc Impact.

Science 342 , 6157 (Oct. 2013), 468–472.2758] van Raan, A. F. J.

Fractal dimension of co-citations.

Nature 347 , 6294 (Oct. 1990),626–626.[59] van Raan, A. F. J.

Sleeping Beauties in science.

Scientometrics 59 , 3 (Mar. 2004),467–472.[60] van Raan, A. F. J., and Winnink, J. J.

The occurrence of ‘sleeping beauty’publications in medical research: Their scientiﬁc impact and technological relevance.

PLOS ONE 14 , 10 (10 2019), 1–34.[61] von Staudt, K.

Beweis eines Lehrsatzes, die Bernoullischen Zahlen betreﬀend.

Jour-nal für die reine und angewandte Mathematik 21 (1840), 372–374.[62]

Wallace, M., Larivière, V., and Gingras, Y.

Modeling a century of citationdistributions.

Journal of Informetrics 3 , 4 (2009), 296–303.[63]

Waltman, L., and van Eck, N. J.

A new methodology for constructing apublication-level classiﬁcation system of science.

Journal of the American Society forInformation Science and Technology 63 , 12 (2012), 2378–2392.[64]

Wang, D., Song, C., and Barabási, A.-L.

Quantifying Long-Term ScientiﬁcImpact.

Science 342 , 6154 (Oct. 2013), 127–132.[65]

Wang, J., Veugelers, R., and Stephan, P.

Bias against novelty in science:A cautionary tale for users of bibliometric indicators.