[PDF] Can We `Feel' the Temperature of Knowledge? Modelling Scientific Popularity Dynamics via Thermodynamics

Abstract

Just like everything in the nature, scientific topics flourish and perish. While existing literature well captures article's life-cycle via citation patterns, little is known about how scientific popularity and impact evolves for a specific topic. It would be most intuitive if we could `feel' topic's activity just as we perceive the weather by temperature. Here, we conceive knowledge temperature to quantify topic overall popularity and impact through citation network dynamics. Knowledge temperature includes 2 parts. One part depicts lasting impact by assessing knowledge accumulation with an analogy between topic evolution and isobaric expansion. The other part gauges temporal changes in knowledge structure, an embodiment of short-term popularity, through the rate of entropy change with internal energy, 2 thermodynamic variables approximated via node degree and edge number. Our analysis of representative topics with size ranging from 1000 to over 30000 articles reveals that the key to flourishing is topics' ability in accumulating useful information for future knowledge generation. Topics particularly experience temperature surges when their knowledge structure is altered by influential articles. The spike is especially obvious when there appears a single non-trivial novel research focus or merging in topic structure. Overall, knowledge temperature manifests topics' distinct evolutionary cycles.

Full PDF

CCan We ‘Feel’ the Temperature of Knowledge?Modelling Scientiﬁc Popularity Dynamics viaThermodynamics

Luoyi Fu , Dongrui Lu , Qi Li , Xinbing Wang , and Chenghu Zhou Electronic Engineering, Shanghai JiaoTong University, China State Key Laboratory of resources and environmental information system, Institute of Geographical Sciences andNatural Resources Research, Chinese Academy of Sciences, China [email protected] [email protected] [email protected] * Corresponding author. E-mail: [email protected] * Corresponding author. E-mail: [email protected] + these authors contributed equally to this work ABSTRACT

Just like everything in the nature, scientiﬁc topics ﬂourish and perish. While existing literature well captures article’s life-cycle viacitation patterns, little is known about how scientiﬁc popularity and impact evolves for a speciﬁc topic. It would be most intuitiveif we could ‘feel’ topic’s activity just as we perceive the weather by temperature. Here, we conceive knowledge temperatureto quantify topic overall popularity and impact through citation network dynamics. Knowledge temperature includes 2 parts.One part depicts lasting impact by assessing knowledge accumulation with an analogy between topic evolution and isobaricexpansion. The other part gauges temporal changes in knowledge structure, an embodiment of short-term popularity, throughthe rate of entropy change with internal energy, 2 thermodynamic variables approximated via node degree and edge number.Our analysis of representative topics with size ranging from 1000 to over 30000 articles reveals that the key to ﬂourishing istopics’ ability in accumulating useful information for future knowledge generation. Topics particularly experience temperaturesurges when their knowledge structure is altered by inﬂuential articles. The spike is especially obvious when there appears asingle non-trivial novel research focus or merging in topic structure. Overall, knowledge temperature manifests topics’ distinctevolutionary cycles.

Scientiﬁc impact assessment helps shape scientiﬁc development from aspects including investment , , promotion policy , andindividual career , . Thanks to its signiﬁcance and widespread applications, measuring scientiﬁc impact has always been oneof the most discussed topics in communities of all disciplines. Citation-based analysis always occupies a predominant rolefor impact assessment because of the quantitative characteristics of citations and more importantly, the positive correlationbetween citation and scientiﬁc inﬂuence , . For an article, citation dynamics reveals its temporal evolution of impact , , andpopularity . For a researcher, the evolution of individual citation statistics portraits his or her activity , scholar impact dynam-ics , , and research interest pattern . For a scientiﬁc topic, however, individual or article citation dynamics modeling fails tocharacterize its life-cycle because this one-dimensional indicator is not capable of exploiting the interplay among academic enti-ties. This raises a fundamental question: how to depict the rise and fall of a scientiﬁc topic by leveraging its citation information?The ﬁrst step to answer this question is to deﬁne scientiﬁc topic and then to ﬁnd an appropriate way to describe it. A scientiﬁctopic is in fact a complex network comprising of articles that have similar research interests. As citation is able to displaythe interaction among articles, we can thus deﬁne and represent a scientiﬁc topic by its citation network. By retrieving andintegrating academic data from renowned databases including but not limited to DBLP, arXiv, Elsevier and Springer, weidentiﬁed 47310 articles that have gained over 1000 citations and have had a non-trivial inﬂuence within their research ﬁelds.These articles were published between 1800 and 2019 and their research interests cover 294 domains in 16 disciplines: History,Computer science, Environmental science, Geology, Psychology, Mathematics, Physics, Materials science, Philosophy, Biology,Medicine, Sociology, Art, Economics, Chemistry and Political science. Some of them created new topics while others made a r X i v : . [ c s . S I] J u l ajor breakthroughs in existing ﬁelds. Their immense contribution and inspiration to subsequent researches has made themeach a leader in their ﬁeld of research. To this end, we refer to these papers as pioneering works and deﬁne a scientiﬁc topic ledby each to be a citation network that consists of the pioneering work, child papers, which are all the articles that directly cite thepioneering work, and all the citations among them. We visualize our scientiﬁc topics with a graph that we call galaxy map.Galaxy map not only highlights the most inﬂuential child papers along with the pioneering work, but also does a preliminaryclustering within the topic (Fig. 1(a,c,e)). We ﬁnd that while some pioneering works still have an overwhelming impact in thescientiﬁc topics they founded, quite a few have several child papers who have established an authority comparable or evengreater than themselves. Furthermore, in some of our examples, these prominent child papers seem to have transformed theoriginal topic into multiple new topics (Fig. 1(e)). Much as galaxy map gives a nice overview of scientiﬁc topic’s currentstatus, the temporal evolution of scientiﬁc topic needs to be further depicted. With this regard, we go beyond the galaxy maprepresentation and dig deeper into the topic citation network for a more intuitive perception of topic’s ﬂourishing dynamics.Since we interpret scientiﬁc topics through their citation pattern, topic evolution is reﬂected by the development of topiccitation network. Complicated academic citation networks are springing up all across the science community as a result of theexplosive research activity growth, both in and across disciplines, and the prevalence of larger teams , . The representationand characterization of complex network has attracted a huge amount of efforts, among which an appeal to statistical thermo-dynamics stands out as a principled school of thought . Some studies at the beginning of this century reveal the intimateconnections between thermodynamic quantities and complex network dynamics . Recently, more literature has succeededin characterizing natural networks , neuron networks and biological networks through thermodynamic approaches. Inparticular, thermodynamic temperature is able to capture critical events in evolving networks . These prior works inspired usin that heat corresponds with popularity and moreover, temperature quantiﬁes partly our body feelings of weather. It would bemost direct and intuitive if we could ‘feel’ topic vigor in the same way as we perceive the weather. Motivated by this thought,we try to depict the ﬂourishing and perishing of scientiﬁc topics by measuring their knowledge temperature, a quantity designedto portrait topic impact and popularity evolution by leveraging the rich structural information hidden in citation networks.Knowledge temperature depends on 3 factors: the evolution of topic size, the evolution of topic knowledge quantity and theadvancement of knowledge structure. As knowledge is a sublimation of information and duplicated information is no longervaluable to knowledge generation, measuring knowledge quantity boils down to evaluating the volume of non-overlapped, oruseful information. The latter, however, can be estimated by examining paper similarity, which essentially involves determiningcitation signiﬁcance. As for knowledge structure, it is also closely related to the question whether a citation is important for anarticle. Therefore, in order to address the key issue in knowledge temperature conception: citation importance judgement, weextracted skeleton tree for each topic (Fig. 1(b,d,f)). Skeleton tree provides a more lucid topic representation than galaxy mapand accentuates the most essential idea inheritance within the topic by preserving the most valuable citation for every childpaper. In particular, we are able to answer 2 fundamental questions by tracing down a path in skeleton tree: from what thoughtan idea is greatly inspired and what new idea it has directly inspired. From another perspective, skeleton tree demonstratescertain clustering effect in its leaves as it puts intimately related articles together. We employed graph embedding techniquesto extract topic skeleton tree. We ﬁrst measured the importance of every citation in the topic based on structural informationand then simpliﬁed topic citation network in 2 steps: ﬁrstly, remove the loops in the citation network and secondly, leave outrelatively unimportant citations while ensuring the global connectivity (Fig. 2(a)). Because the extraction process involves athorough investigation into citation network structure, topic skeleton tree serves as an indispensable tool for our knowledgetemperature design and for the heat distribution visualization within the topic.We evaluated topic knowledge temperature from 2 aspects: topic growth and recent structural change in topic knowledge. Ourcore idea is to make an analogy between topic citation network G t and ideal gas. At timestamp t , we deﬁne topic knowledgetemperature T t as: T t = T tgrowth + T tstructure (1)where T tgrowth measures knowledge increment and T tstructure estimates the magnitude of changes in knowledge structure between2 consecutive timestamps.We initialized T tgrowth by combining 2 ideal gas’s internal energy expressions and updated T tgrowth via ideal gas state equation, PV = nRT , under the assumption that G t ’s expansion is an isobaric process. With pressure P being invariant and R beingconstant, the variation of T tgrowth is governed by the dynamics of topic mass n t and topic volume V t . From a macroscopic view ofinformation and knowledge, n t measures the total amount of overlapped information whereas V t represents the total amount ofinformation. A simple qualitative analysis shows that T tgrowth increases when topics succeed in accumulating distinct, or useful2nformation, the knowledge source for the future. Intuitively, promising topics are able to attract a steady or even growinginﬂow of new information. On the contrary, staggering topics consume more useful information than they receive and theirpotential eventually drops. A rising T tgrowth indicates an increasingly solid and rich knowledge base and thus reﬂects a topic’sgrowing impact. Furthermore, an accelerating increase in T tgrowth suggests a topic’s greater capability in useful informationcollection and thus its faster gain in fame.Inspired by the temperature design in prior work , , we computed T tstructure between every two adjacent timestamps by makingan analogy between G t ’s evolution and an isochoric process. The analogy is legitimate as long as the node number is ﬁxed,which unfortunately does not hold for G t . In order to solve this issue, we designed a graph shrinking algorithm that transformsthe newcomers from timestamp t − t into virtual citations among nodes in G t − (Fig. 2(b)). We deﬁned T tstructure as theaverage structural change brought by a node in G t : T tstructure = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dU t dS t | V t | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U t − U t − S t − S t − | V t | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (2)where S t − , S t are the von Neumann entropy of G t − and G t , the weighted reduced graph of G t and U t − , U t their internalenergy. We approximated von Neumann entropy by node degree and set internal energy to be the number of edges for simplicity.Different from T tgrowth which focuses more on continual knowledge increment, T tstructure is designed to capture recent criticalevents and hence assesses topic’s short-term popularity.Among all the topics, we identiﬁed 16 representative topics to conduct our knowledge temperature experiment. These articleswere published between 1959 and 2014 and their research interests fall in domains including machine learning, wirelessnetwork, graph theory, biology and physics. These topics have sizes ranging from over 1000 articles and approximately5000 citations to more than 31000 articles and nearly 200 thousand citations. We ﬁnd that the temporal evolution of T t welldepicts topic ﬂourishing, with T tgrowth quantifying knowledge accumulation and T tstructure reﬂecting knowledge structure shift. T tgrowth varies smoothly and determines the overall trend of T t (Fig. 3(a)). A big rise in T tgrowth correspond most often witha signiﬁcant increase in topic size. Typically, during such periods, some child papers started to gain popularity and collecta non-trivial number of citations within the topic. They helped the pioneering work maintain the topic visibility , . Theirattractiveness to new ideas, added to that of the pioneering work, helped contribute to the enrichment of topic knowledgepool (Fig. 3(b)). A direct and visible consequence of this phenomenon is a fortiﬁcation of existing knowledge structure,sometimes accompanied by a mild extension (Fig. 3(c-e)). Nonetheless, an ever-growing topic scale is not a guarantee forthriving periods. For instance, T tgrowth of topic led by ‘Critical Power for Asymptotic Connectivity in Wireless Networks’has been on the decrease since 2011 despite a continuous size growth. This corresponds to the fact that almost all of theinﬂuential child papers within the topic were published no later than 2005. The lack of new, promising ideas and remarkableextensions to existing researches afterwards makes the topic lose community’s attention and results in the topic’s demise. Asfor topic led by ‘A uniﬁed architecture for natural language processing: deep neural networks with multitask learning’, itsdecline in T tgrowth since 2015 is somewhat atypical. The decrease is owing to the emergence of popular child papers publishedbetween 2013 and 2014 that largely excel their parent. Child papers ‘Efﬁcient Estimation of Word Representations in VectorSpace’, ‘Distributed Representations of Words and Phrases and their Compositionality’ and ‘Glove: Global Vectors for WordRepresentation’ have each attracted around 600 citations within the topic, while their total citations have all surpassed 8000,much greater than their antecedent whose citation count still remains below 3000. They have had such big achievementsthat they have become the authorities in the domain. Consequently, they have won over the attention of subsequent studies,which in turn affects the knowledge accumulation of the topic created by their parent paper. We observe that articles publishedafter 2016 in the topic have not had a comparable development. This conﬁrms partly the shadowing effect caused by theprominent child papers mentioned above. T tstructure , unlike T tgrowth , can vary greatly over time. It usually accounts for importantﬂuctuations of T t (Fig.4(a,b)). A high T tstructure usually marks one of the following 2 events: the formation of sub-topicsand the fusion of sub-topics. The ﬁrst event is a consequence of the arrival of rising stars in the topic. These articles, laterproven inﬂuential to the topic evolution, either introduce multiple research directions or contribute to the ﬂourishing of asingle novel research focus. The second event takes place when there is subsequent literature uniting prior works’ research.More speciﬁcally, the sub-topic merge occurs when there appears some unusual citations where an old article cites a youngone and that the young article is crucial to topic development (Fig. 4 (b,d,f)). Both the emergence of a single non-trivialresearch focus and the sub-topic merge can cause an obvious spike in T tstructure . For instance, topic led by ‘Neural Networks forPattern Recognition’ had a sudden T tstructure increment when child paper ‘A Tutorial on Support Vector Machines for PatternRecognition’ established a third sub-topic direction. In topic led by ‘On random graphs, I’, prominent child paper ‘On theevolution of random graphs’ fuses prior works’ ideas and changed topic landscape. However, the heat bought by such critical3vents are ephemeral. In the long run, their impact on topic’s life-cycle is eventually reﬂected by the knowledge accumulationprocess, which is quantiﬁed by T tgrowth . We note that inﬂuential child papers play an important role in both T t ’s componentsand thus is crucial to topic’s thriving. However, the duration between their publication and their visible contribution varies a lot .Besides knowledge temperature, we can also feel topic vigor by examining its skeleton tree. In fact, the evolution of knowledgetemperature is consistent with the development of skeleton tree. Its skeleton tree thrives when a topic gains popularity andfame. In times when T tgrowth rises, skeleton tree grows increasingly sturdy as newly published papers enrich existing researchbranches (Fig. 3(c-e)). During periods when T tstructure soars, topics usually form new research focus thanks to some prominentchild papers. The trend is visualized by the emergence of new non-trivial clusters or branches. Sometimes, lately developedresearch directions prove to be a big success and start to defy topic authorities by attracting most new articles’ attention. In suchcases, skeleton tree also manifests a gravity shift, with new branches and clusters developing much faster than the previouslydominating ones (Fig. 4(a,c,e)). Finally, if the rise of T tstructure is due to sub-topic merge, separated parts of skeleton tree areconnected together by a young article which later proved crucial to topic development (Fig. 4(b,d,f)). When a topic loses itappeal, its skeleton tree stagnates, just like its knowledge temperature (Fig. 3(f,g)).We observe a rich variation in T t ’s dynamics as each topic exhibits a unique development pattern. We identify 4 distinct topiclife-cycles: rising topic, rise-then-fall topic, awakened topic and rise-and-fall-cycle topic. Rising topics demonstrate overall asteady and lasting T t increase. They welcome rather intermittently their child papers that enjoy popularity within the topic. Thisensures to some extent a stable knowledge increment. Rise-then-fall topics reach their peak at some point and then go downhillowing to the lack of new development of existing ideas, the absence of new study focus or the shadowing of their outstandingchild papers. In addition, their expansion pace slows down during the cooling down phase. Awakened topics can have a milddevelopment for a duration as long as 20 years before experiencing an inﬂuence surge. Their sudden ﬂourishing is largely dueto scientiﬁc communities’ recent frenzy in certain domains, such as artiﬁcial intelligence. Rise-and-fall-cycle topics manifesta more complicated T t pattern. However, their rises and falls also match the global background, such as the introductionof the Internet, the booming of artiﬁcial intelligence and the prevalence of online social networks (Detailed discussion is inSupplementary Information section S3.1-S3.4).How is heat distributed within a topic? To answer this question, we interpreted T t as the average temperature of G t andcomputed knowledge temperature for every article based on T t . Node knowledge temperature gauges a work’s relativepopularity and impact within the topic at a certain moment. At each timestamp t , we assumed the hottest and coldest worksand then employed the heat equation to propagate the heat across G t . For a node u , its temperature change dT u di is (we omit thesuperscript t of node temperature in the equation): dT u di = | V t | ∑ v = f A tvu ( T v − T u ) (3)where f A tvu is the thermal conductivity between node v and node u . We set the pioneering article to be the hottest node (knowledgetemperature = 1) and all the underdeveloped papers to be the coldest nodes (knowledge temperature = 0). We modelled heatpropagation via idea inheritance and youngster’s contribution to knowledge renaissance respectively by forward and backwarditerations of the heat equation. The number of iteration i depends on the average hops between 2 randomly selected nodes.Finally we performed a scaling by T t . Node u ’s knowledge temperature at timestamp t , T tu is therefore: T tu = T tu , std · T t T tstd (4)where T tu , std is u ’s temperature and T tstd the average temperature derived from the heat equation.We visualized node knowledge temperature by skeleton tree. If we let alone the coldest papers, we observe a ubiquitousphenomenon: the closer an article is to the pioneering work, the hotter it tends to be. Node knowledge temperature decreasesalong paths in skeleton tree (Fig. 4(c-f)). Although pioneering work is the only known hottest node, we identify other heatsources, the majority of which are the centers of non-trivial clusters. Most heat sources happen to be among the most-citedchild papers within a topic. They possess primarily intrinsic value. Their own research content contributes a lot to topic’ssurvival and ﬂourishing. Another type of heat source are articles situated between clusters. Such papers may not have madeastonishing discoveries nor have attracted many followers, but it is their studies that have inspired some inﬂuential subsequentwork. Their value lies essentially in the enlightenment. 4n an effort to better understand general heat distribution within topics, our preliminary observation prompted us to study therelation between node knowledge temperature and article age, as papers located in skeleton tree cores are parents or ancestors topapers on the periphery. We ﬁnd that regardless of research themes, older papers indeed tend to have higher knowledge temper-atures (Fig. 5). Older papers take advantage of a longer time span and tend to better diffuse their ideas thanks to their numerousfollowers, a tendency in line with our intuition. Since we assume pioneering works possess the "hottest" knowledge, the gradualtemperature decline well illustrates that idea inheritance and innovation are taking place simultaneously in every scientiﬁc topic.However, we observe a drop in average node knowledge temperature among the oldest papers in half of the topics. 2 phenomenacan explain the anomaly. Some topics contain a tiny fraction of atypical citations where younger articles are cited by olderpapers or papers published at approximately the same time. When the younger articles happen to be pioneering works, the oldestpapers are no longer the topic founders. They usually have inspired few or even no child papers in the topics. Consequently,they are among the coldest nodes. In rare cases, these papers inspired a certain quantity of works. But they remain "cold"owing to their relatively different research focus with that of the pioneering works even though they are connected to the latter.Their citations are more like peer bonds rather than a symbol of inspiration and idea inheritance. Such is the case for the pi-oneering work ‘Particle swarm optimization’ and its peer and popular child paper ‘A new optimizer using particle swarm theory’.Even if we let alone the cold old articles, the heat distribution is not that simple and monotonous. We observe in most topicsthat parent papers are not always hotter than its descendants. According to our design, node knowledge temperature is affectedby 2 factors: the heat-level of its own research content and the promotion gained from its descendants. Therefore, a colderparent or ancestor is either due to its less prevalent ideas or a poor general performance of its children. This phenomenonimplies that an important status within the topic does not necessarily bring much fame.We further compared node knowledge temperature with in-topic citation count, a traditional article-level impact metrics, toget a better understanding of their similarities and differences (Fig. S49). We ﬁnd a weak positive correlation between thetwo quantities among the best-cited papers in topics. In particular, we highlighted the most-cited child papers together withpioneering works on current skeleton trees. Most of them have a knowledge temperature above average as they are representedas yellow, orange or red nodes (current skeleton trees in Supplementary Information Fig. S3, S9, S15, S35 for example).However, there are exception. For instance, in topic led by ‘Particle swarm optimization’, popular child paper ‘A new optimizerusing particle swarm theory’ (NOPST) is among the coldest despite the fact that it is the most inﬂuential child paper in terms ofcitation count (Fig. S35). NOPST was published in the same year as the pioneering work and it only cited the pioneering work.Its low temperature is due to its relatively different research focus with that of the pioneering work and an overall low heat levelof its children. The latter is somehow also a consequence of the former, as the pioneering work has most prevalent idea. Thefocus difference is also reﬂected by their separation in the skeleton tree.We also tracked the knowledge temperature evolution of relatively popular child papers within a topic and we ﬁnd a similarphenomenon already observed at topic-level. While an article’s own knowledge largely determines its heat level, child paperssometimes play a perceptible role in boosting or maintaining its popularity and impact. For example, in the topic led by paper‘Bose-Einstein condensation in a gas of sodium atoms’, article ‘Bose-Einstein condensation of exciton polaritons’ has kept beinghotter since its publication despite a global cooling since 2013 thanks to an above-average active development (SupplementaryInformation S3.2.7). Our ﬁnding is consistent with the research which demonstrates that papers need new citations to keep theirvisibility . Besides, in some topics, especially the one led by ‘Collective dynamics of ‘small-world’ networks’, we frequentlyﬁnd that popular child papers were published in renowned journals such as Nature and Science (Supplementary Informationsection 3.4). Our observation accords with research which suggests a positive association between journal prestige and articlehigh impact .Nonetheless, we ﬁnd that several scientiﬁc topics are intimately connected. Some pioneering works occupy a primordial positionin other topics’ skeleton trees. Furthermore, these closely related topics manifest similar knowledge temperature dynamics.However, such similarity does not correspond very well with idea inheritance and development in some cases. For instance,paper ‘The capacity of wireless networks’ (CMN) is the most successful child paper of the pioneering work ‘Critical Power forAsymptotic Connectivity in Wireless Networks’. It plays a crucial role in topic’s prosperity (Fig. S12) by jointly inspiring onethird of the topic members, most of which were published during the ﬂourishing period. Besides, CMN surpassed and took overits predecessor to be the new authority in their domain in just a few years. Yet, according to their topic knowledge temperatures,it is the topic led by CMN that went downhill ﬁrst. To this end, we wanted to design a mechanism that allows us to better capturethe interactions among closely-connected topics. Following our skeleton tree notion, we were inspired by the nutrition transferamong real trees in a forest . We hence treated scientiﬁc topics as trees and conceived a forest helping mechanism wherethriving topics transfuse a small fraction of vigor to their dying siblings. The amount of shared energy depends on both the ages5nd the size of the topic group. When we compare topic knowledge temperatures before and after forest helping, we ﬁnd that ourhelping mechanism regulates mildly the temperatures as if it took into account the "background popularity", average popular-ity of a bigger research topic to which the group belongs. Overall, forest helping slightly reduces the ﬂuctuation of T t (Fig. S50).In summary, we report a thermodynamic approach to depict the rise and fall of scientiﬁc topics. We design knowledgetemperature, an intuitive and quantitative metrics to evaluate topic overall popularity and impact dynamics by fully leveragingthe scale and structure dynamics of citation network through skeleton tree. A continuous streaming of useful information isthe key to topics’ prosperity in the long run, to which the arrival of eminent child papers contributes a lot. In the short term,critical events such as the merge and emergence of new sub-topic also boost topic’s vigor. In addition, we also examine the heatdiffusion within topics and discover that older articles generally have bigger chances to diffuse its ideas and thus enjoy a higherpopularity within the topic. However, exceptions exists widely, suggesting that the positive correlation between heat-level andarticle’s age and impact remains weak. Finally, we design a forest helping mechanism to better depict the idea inheritanceand development among intimately-associated topics. Although knowledge temperature cannot directly be used as a scientiﬁcimpact metrics, our study suggests a new possibility to quantify research impact in a most intuitive way. Data Availability

All code is available at https://github.com/drlisette/knowledge-temperature.Data are available at https://github.com/drlisette/knowledge-temperature. Other related, relevant data are available from thecorresponding author upon reasonable request. 6 a) (b)(c) (d)(e) (f)

Figure 1 a)(b) Figure 2 a) (b)(c) (d) (e)(f) (g) Figure 3 a) (b)(c) (d)(e) (f) Figure 4 a) (b) (c) (d)(e) (f) (g) (h)(i) (j) (k) (l)(m) (n) (o) (p) Figure 5 eferences Lane, J. & Bertuzzi, S. Measuring the results of science investments.

Science , 678–680 (2011). Bromham, L., Dinnage, R. & Hua, X. Interdisciplinary research has consistently lower funding success.

Nature ,684–687 (2016). Lane, J. Let’s make science metrics more scientiﬁc.

Nature , 488–489 (2010). Radicchi, F., Fortunato, S., Markines, B. & Vespignani, A. Diffusion of scientiﬁc credits and the ranking of scientists.

Phys. Rev. E , 056103 (2009). Clauset, A., Arbesman, S., Larremore, D. & Vespignani, A. Systematic inequality and hierarchy in faculty hiring networks.

Sci. Adv. , e1400005–e1400005 (2015). Jordi, D. et al.

The possible role of resource requirements and academic career-choice risk on gender differences inpublication rate and impact.

PLoS ONE , e51332 (2012). Adams, J. Early citation counts correlate with accumulated impact.

Scientometrics , 567–581, DOI: 10.1007/s11192-005-0228-9 (2005). Radicchi, F., Weissman, A. & Bollen, J. Quantifying perceived impact of scientiﬁc publications.

J. Informetrics ,704–712, DOI: 10.1016/j.joi.2017.05.010 (2017). Wang, D., Song, C. & Barabási, A. Quantifying long-term scientiﬁc impact.

Science , 127–132, DOI: 10.1126/science.1237825 (2013).

He, Z., Lei, Z. & Wang, D. Modeling citation dynamics of “atypical” articles.

J. Assoc. for Inf. Sci. Technol. ,1148–1160 (2018).

Hajra, K. & Sen, P. Aging in citation networks.

Phys. A: Stat. Mech. its Appl. , 44–48 (2005).

Shen, H., Wang, D., Song, C. & Barabási, A. Modeling and predicting popularity dynamics via reinforced poissonprocesses.

The Twenty-Eighth AAAI Conf. on Artif. Intell. , 1 (2014). Liu, L., Wang, Y. & Sinatra, R. e. a. Hot streaks in artistic, cultural, and scientiﬁc careers.

Nature , 396–399, DOI:10.1038/s41586-018-0315-8 (2018).

Hirsch, J. An index to quantify an individual’s scientiﬁc research output.

Proc. Natl. Acad. Sci. United States Am. ,16569–16572, DOI: 10.1073/pnas.0507655102 (2005).

Egghe, L. Theory and practice of the g-index.

Scientometrics , 131–152, DOI: 10.1007/s11192-006-0144-7 (2006). Boyack, K. W. & Klavans, R. Co-citation analysis, bibliographic coupling, and direct citation: Which citation approachrepresents the research front most accurately?

J. Assoc. for Inf. ence Technol. , 2389–2404, DOI: 10.1002/asi.21419(2010).

Jia, T., Wang, D. & Szymanski, B. Quantifying patterns of research-interest evolution.

Nat. Hum. Behav. , 0078 (2017). Guimerà, R., Uzzi, B., Spiro, J. & Amaral, L. Team assembly mechanisms determine collaboration network structure andteam performance.

Science , 697–702, DOI: 10.1126/science.1106340 (2005).

Wu, L., Wang, D. & Evans, J. Large teams develop and small teams disrupt science and technology.

Nature , 378–382,DOI: 10.1038/s41586-019-0941-9 (2018).

Mikulecky, D. Network thermodynamics and complexity: a transition to relational systems theory.

Comput. & Chem. , 369–391, DOI: 10.1038/s41586-019-0941-9 (2001).

Estrada, E. & Hatano, N. Statistical-mechanical approach to subgraph centrality in complex networks.

Chem. Phys. Lett. , 247–251, DOI: 110.1016/j.cplett.2007.03.098 (2009).

Hartonen, T. & Annila, A. Natural networks as thermodynamic systems.

Complexity , 53–62, DOI: 10.1002/cplx.21428 (2012).

Tkaˇcik., G. et al.

Thermodynamics and signatures of criticality in a network of neurons.

Proc. Natl. Acad. Sci. ,11508–11513, DOI: 10.1073/pnas.1514188112 (2015).

Hubbard, J., Halter, M., Sarkar, S. & Plant, A. The role of ﬂuctuations in determining cellular network thermodynamics.

PLoS ONE , e0230076, DOI: 10.1371/journal.pone.0230076 (2020).

Ye, C., Wilson, R., Rossi, L., Torsello, A. & Hancock, E. Thermodynamic analysis of time evolving networks.

Entropy , 759, DOI: 10.3390/e20100759 (2018). 12 Ye, C. et al.

Thermodynamic characterization of networks using graph polynomials.

Phys. Rev. E , 032810, DOI:10.1103/PhysRevE.92.032810 (2015).

Ye, C., Wilson, R., Comin, C., Costa, L. & Hancock, E. Approximate von neumann entropy for directed graphs.

Phys. Rev.E , 052804, DOI: 10.1103/PhysRevE.89.052804 (2014).

Pollmann, T. Forgetting and the ageing of scientiﬁc publications.

Scientometrics , 43–54, DOI: 10.1023/A:1005613725039 (2000).

Wang, J. Citation time window choice for research impact evaluation.

Scientometrics , 851–872, DOI: 10.1007/s11192-012-0775-9 (2013).

Didegah, F. & Thelwall, M. Which factors help authors produce the highest impact research? collaboration, journal anddocument properties.

J. Informetrics , 861–873, DOI: 10.1016/j.joi.2013.08.006 (2013).

Giovannetti, M. et al.

At the root of the wood wide web self recognition and non-self incompatibility in mycorrhizalnetworks.

Plant signaling & behavior , 1–5, DOI: 10.4161/psb.1.1.2277 (2006).

Author contributions statement

L.F. conceived the idea to depict topic ﬂourishing dynamics by thermodynamic temperature, checked model feasibility andsummarised results.D.L. designed the knowledge temperature model, did data visualization, conceived the experiments, analysed and summarisedthe results.Q.L. processed the topic data and optimized the skeleton tree algorithm.X.W. gave invaluable comments for paper writing.

Additional information

Competing interests

The author(s) declare no competing interests. 13 igure captions

Figure 1: Comparison between galaxy map and topic skeleton tree.

In galaxy map: Node size and title size are proportionalto total citation count. Only the most-cited papers are labelled with titles. Node colour of pioneering work is red. Node colourof the other articles are determined by their positions under the ForceAltas layout algorithm. Nodes in the same cluster takea same colour (yellow, green, blue or pink). In topic skeleton tree: Node size (except pioneering work) is proportional tostructure entropy. Pioneering work node is twice the maximum size of the child paper nodes. Node colour is the same asin galaxy map. Only pioneering work is labeled by its title. (a,b) Topic led by ‘Critical Power for Asymptotic Connectivityin Wireless Networks’. (a) Numerous child papers, especially ‘The capacity of wireless networks’ and ‘HEED: a hybrid,energy-efﬁcient, distributed clustering approach for ad hoc sensor networks’, have outperformed the pioneering work. (b) Afterinitial development, the topic has found two research focus. (c,d) Topic led by ‘Latent dirichlet allocation’. (c) The pioneeringwork has a dominant inﬂuence. (d) Three research directions have derived directly from the initial idea. (e,f) Topic led by ‘Onrandom Graphs, I’. (e) Two inﬂuential child papers, ‘On the evolution of random graphs’ and ‘The Structure and Function ofComplex Networks’ seem to split the topic into two parts. (f) The pioneering work has inspired in particular one school ofthought. There is no signiﬁcant division in topic’s knowledge structure.

Figure 2: Skeleton tree extraction and graph shrinking demo.

The red node labelled "P" represents the pioneering work.Green nodes are child papers. A directed edge from A to B represents "B cites A". (a) Skeleton tree extraction. From left tomiddle: loop cutting. Child papers c c T tstructure computation. Graph shrinking process transforms the newly arrived articles intovirtual citations among existing papers. For example, child paper c t − t and cites all papersin the topic. Its citations suggest that c c

2, disconnected in G t − , have certain connections in their research content.We remove c c c c c G t ’s shrinked counterpart, G t . Figure 3: Knowledge temperature (especially T t and T tgrowth ) and skeleton tree evolution of topic led by ‘A uniﬁed ar-chitecture for natural language processing: deep neural networks with multitask learning’. Nodes in skeleton tree arecoloured according to their knowledge temperature, with red being the hottest, yellow being the average level and blue thecoldest within the topic. Node size (except pioneering work) is proportional to (re-scaled) structure entropy ? . Pioneering worknode is twice the maximum size of the child paper nodes. (a) Knowledge temperature evolution. T tgrowth dominates T t . (b)Current topic skeleton tree. The pioneering work and 4 most top-cited papers within the topic are labelled by title. (c,d,e) Topicskeleton tree by the end of 2011, 2013 and 2015. The thriving period is characterized by a steady knowledge accumulation,depicted by a fast-growing skeleton tree where small new clusters emerge and existing branches become increasingly robust.(f,g) Topic skeleton tree by the end of 2017 and 2019. The stagnation period is reﬂected by a decelerating growth and an almostﬁxed tree shape. Figure 4: Knowledge temperature (especially T t and T tstructure ) and skeleton tree evolution of topics led by ‘The capacityof wireless networks’ (CWN) and ‘On random graph, I’ (RG). Nodes in skeleton tree are coloured according to theirknowledge temperature, with red being the hottest and blue the coldest within the topic. Node size (except pioneering work) isproportional to (re-scaled) structure entropy. Pioneering work node is twice the maximum size of the child paper nodes. (a,b)Knowledge temperature evolution. T tstructure accounts for T t ’s ﬂuctuations. (c,e) Skeleton tree of the topic led by CWN bythe end of 2003 and 2007. Advancements are visible in all directions. In particular, the gravity shift in the tree implies theemergence of new research focus, which in turn yields a soar in T tstructure . (d,f) Skeleton tree of the topic led by RG by the endof 1979 and 1984. Article ‘On the evolution of random graphs’ published in 1984 fuses the previously separated parts due to anatypical citation from an older article ’On the existence of a factor of degree one of a connected random graph’. The merge intopic knowledge structure pushed up T tstructure during that period. Figure 5: Relation between article age and node knowledge temperature for 16 topics.

Article age = 2020 - year ofpublication. Grey dotted horizontal line marks the topic knowledge temperature (average level) in 2020. (a) Topic led by‘Regulatory T Cells: Mechanisms of Differentiation and Function’. (b) Topic led by ‘Empirical Evaluation of Gated RecurrentNeural Networks on Sequence Modeling’. (c) Topic led by ‘Neural networks for pattern recognition’. (d) Topic led by ‘CriticalPower for Asymptotic Connectivity in Wireless Networks’. (e) Topic led by ‘The capacity of wireless networks’. (f) Topic ledby ‘Efﬁcient Estimation of Word Representations in Vector Space’. (g) Topic led by ‘Coverage problems in wireless ad-hocsensor networks’. (h) Topic led by ‘A neural probabilistic language model’. (i) Topic led by ‘A uniﬁed architecture for naturallanguage processing: deep neural networks with multitask learning’. (j) Topic led by ‘Bose-Einstein condensation in a gas of14odium atoms’. (k) Topic led by ‘Long short-term memory’. (l) Topic led by ‘Particle swarm optimization’. (m) Topic led by‘On random graphs, I’. (n) Topic led by ‘Collective dynamics of ‘small-world’ networks’. (o) Topic led by ‘Latent dirichletallocation’. (p) Topic led by ‘A FUNDAMENTAL RELATION BETWEEN SUPERMASSIVE BLACK HOLES AND THEIRHOST GALAXIES’. 15 an We ‘Feel’ the Temperature of Knowledge?Modelling Scientiﬁc Popularity Dynamics viaThermodynamics

S1 Data Description 2S2 Model 4

S2.1 Topic Skeleton Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4S2.2 Structure Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5S2.3 Topic Knowledge Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6S2.3.1 T tgrowth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6S2.3.2 T tstructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7S2.4 Node Knowledge Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9S2.5 Forest Helping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 S3 Experiments 10

S3.1 Rising Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10S3.1.1 Regulatory T Cells: Mechanisms of Differentiation and Function . . . . . . . . . . . . . . . . . . . . . .10S3.1.2 Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling . . . . . . . . . . . .13S3.1.3 Neural networks for pattern recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15S3.2 Rise-then-fall Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21S3.2.1 Critical Power for Asymptotic Connectivity in Wireless Networks . . . . . . . . . . . . . . . . . . . . .21S3.2.2 The capacity of wireless networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23S3.2.3 Efﬁcient Estimation of Word Representations in Vector Space . . . . . . . . . . . . . . . . . . . . . . .28S3.2.4 Coverage problems in wireless ad-hoc sensor networks . . . . . . . . . . . . . . . . . . . . . . . . . . .31S3.2.5 A neural probabilistic language model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32S3.2.6 A uniﬁed architecture for natural language processing: deep neural networks with multitask learning . . .38S3.2.7 Bose-Einstein condensation in a gas of sodium atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . .41S3.3 Awakened topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45S3.3.1 Long short-term memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45S3.3.2 Particle swarm optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48S3.4 Rise-fall-cycle topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53S3.4.1 On random graphs, I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53S3.4.2 Collective dynamics of ‘small-world’ networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57S3.4.3 Latent dirichlet allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60S3.4.4 A FUNDAMENTAL RELATION BETWEEN SUPERMASSIVE BLACK HOLES AND THEIRHOST GALAXIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .621 a r X i v : . [ c s . S I] J u l S1 Data Description

We collected topic citation relations from academic databases including DBLP, arXiv, Elsevier and Springer. Each topic is ledby an article that have had a profound inﬂuence in certain domains. We refer to these papers as pioneering papers or leadingpapers. A scientiﬁc topic includes a pioneering paper, all the articles that directly cites it and all the citations among them. Wechose 16 topics among our dataset to conduct the knowledge temperature experiment. Pioneering paper information is listed inTable S1 and topic size is listed in Table S2. Topics are ordered by publishing year.Among 16 topics, we identify 3 topic groups, each containing 2 or 3 topics:1. wireless network group.Group is jointly led by Critical Power for Asymptotic Connectivity in Wireless Networks and The capacity of wirelessnetworks.2. RNN gated unit group.Group is jointly led by Long short-term memory and Empirical Evaluation of Gated Recurrent Neural Networks onSequence Modeling.3. word embedding group.Group is jointly led by A uniﬁed architecture for natural language processing: deep neural networks with multitasklearning, A neural probabilistic language model and Efﬁcient Estimation of Word Representations in Vector Space.2eading paper year journal conference seriesOn random graphs, I 1959Bose-Einstein condensation in a gas of sodiumatoms 1995 Physical Review LettersParticle swarm optimization 1995 International Conference onNetworks (ICON)Neural networks for pattern recognition 1995 Advances in ComputersLong short-term memory 1997 Neural ComputationCollective dynamics of ‘small-world’ networks 1998 NatureCritical Power for Asymptotic Connectivity inWireless Networks 1999The capacity of wireless networks 2000 IEEE Transactions on Infor-mation TheoryA FUNDAMENTAL RELATION BETWEEN SU-PERMASSIVE BLACK HOLES AND THEIRHOST GALAXIES 2000 The Astrophysical JournalCoverage problems in wireless ad-hoc sensor net-works 2001 International Conference onComputer Communications(INFOCOM)Latent dirichlet allocation 2003 Journal of Machine Learn-ing ResearchA neural probabilistic language model 2003 Journal of Machine Learn-ing ResearchA uniﬁed architecture for natural language process-ing: deep neural networks with multitask learning 2008 International Conference onMachine Learning (ICML)Regulatory T Cells: Mechanisms of Differentiationand Function 2012 Annual Review of Immunol-ogyEfﬁcient Estimation of Word Representations inVector Space 2013 International Conference onLearning Representations(ICLR)Empirical Evaluation of Gated Recurrent NeuralNetworks on Sequence Modeling 2014 arXiv: Neural and Evolu-tionary Computing

Table S1.

Pioneering Paper Information3eading paper node num. edge num.On random graphs, I 5389 17098Bose-Einstein condensation in a gas of sodium atoms 2338 9171Particle swarm optimization 31800 183341Neural networks for pattern recognition 17046 42748Long short-term memory 16777 98553Collective dynamics of ‘small-world’ networks 25548 206646Critical Power for Asymptotic Connectivity in Wireless Networks 1078 4998The capacity of wireless networks 7644 51788A FUNDAMENTAL RELATION BETWEEN SUPERMASSIVE BLACK HOLES ANDTHEIR HOST GALAXIES 2432 34120Coverage problems in wireless ad-hoc sensor networks 1546 8865Latent dirichlet allocation 18813 114969A neural probabilistic language model 3265 22912A uniﬁed architecture for natural language processing: deep neural networks with multitasklearning 2733 13855Regulatory T Cells: Mechanisms of Differentiation and Function 1381 4190Efﬁcient Estimation of Word Representations in Vector Space 8133 36219Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling 2282 4675

Table S2.

Topic Overview

S2 Model

Our core idea is to treat citation network G t = ( V t , E t ) as a thermodynamic system, more speciﬁcally, ideal gas. G t is a directedgraph whose nodes consist of a pioneering paper and all the articles that directly cites it and whose edges are the citationsamong them. Its adjacency matrix A t is deﬁned as: A tuv = ( v cites u S2.1 Topic Skeleton Tree

Skeleton tree illustrates the knowledge structure of a topic. Its evolution reveals a topic’s development pattern. The extractionof skeleton tree is essentially a process to reduce a graph to a tree. We note G t ’s skeleton tree Tree t = ( V tT , E tT ) . For notationsimplicity, we omit superscript t for variables that appear in the rest of this subsection. There are altogether 3 steps in Tree t ’sconstruction:1. We perform node embedding and compute distance matrix EmbedDist that shows the node pair-wise distance inembedding space.2. We derive matrix

Di f f Idx based on

EmbedDist to measure the difference between every node pair. Vector

ReductionIdx ,a node score which serves to judge the citation importance, is computed afterwards. We rely on

ReductionIdx to prune G t in the following step.3. We reduce G t to Tree t by removing less important references while ensuring the overall connectivity. The signiﬁcance ofa citation is determined by the similarity of 2 papers, which is assessed through their reduction indices. The processinvolves loop cutting and tree pruning. In Tree t , every node except the root, which is exactly the pioneering node, has atmost one citation.We start by slightly modifying adjacency matrix A by adding a self-loop to the pioneering work. This is for the convenience ofspectral decomposition. Then, we compute out-degree matrix D and normalized Laplacian matrix e L = D − ( D − A ) D − . D is a4iagonal matrix, with diagonal entries equal to the out-degree, or practically speaking the in-topic citation count of each node.We next perform a full spectral decomposition of e L . The eigenvectors are our node embeddings and EmbedDist is a distancematrix with entry

EmbedDist u , v = k eigenvector u − eigenvector v k .Now we proceed to compute difference matrix Di f f Idx . For node pair ( u , v ) , we deﬁne their difference index Di f f Idx u , v as: Di f f Idx u , v = ∑ v parent d u , v parent v parent s are the predecessors of v and d u , v parent is the shortest weighted path between u and v parent : d u , v parent = ( ∑ ( i , j ) ∈ path EmbedDist i , j if there exists a path between u and v parent MaxDist × avgStep otherwise MaxDist is the biggest distance between two connected nodes,

MaxDist = max ( a , b ) ∈ E t ( EmbedDist a , b A a , b ) and avgStep is theaverage hop number of all shortest paths between any two reachable nodes. Di f f Idx gauges the difference between u and v byinvolving works that inspire v . If u and v parent is reachable from each other, it suggests that there is some degree of similarity intheir ideas or research topics and thus we represent their distance by shortest path’s weight. Else, we model their correlation bya long imaginary path of avgStep hops and step length of MaxDist . Therefore, the greater

Di f f Idx u , v is, the more different u and v are.For a node u , its reduction index ReductionIdx u is deﬁned as the sum of its difference indices: ReductionIdx u = ∑ v ∈ V t \ u Di f f Idx u , v Vector

ReductionIdx helps to determine the importance of citations. A citation between two articles with similar reductionindices is considered more valuable than one between two papers with different reduction indices.We are now ready to extract topic skeleton tree. The ﬁrst step is to ﬁnd and cut loops in G t . We cut a loop by removing the leastimportant edge (its extremities have the most different reduction indices). Nonetheless, we try to ensure that the edge we cut isnot the last citation left for some node so as to preserve overall connectivity as much as possible. After loop cutting, we obtaina tree. The second step is to remove redundant citations in the tree. Recall that we only keep one citation for every node exceptthe root in Tree t . Fig. 2(a) illustrates the whole process with a toy example. S2.2 Structure Entropy

We adopt structure entropy to determine the node size in the skeleton tree visualisation. Structure entropy measures theuncertainty of the tree structure if node u is absent. Consequently, it makes sense to evaluate the importance of a paper toknowledge passing within the topic by structure entropy. For a node u other than the root, its structure entropy S tu is deﬁned as: S tu = − g tT , u | E tT | log V tT , u V tT , u parent g tT , u is the cut size of the sub-tree Tree tu whose root is u . It is the sum of the degree of nodes in Tree tu in Tree t . E tT is the edgeset of skeleton tree. V tT , u is the number of nodes Tree tu contains (the sum of out-degrees of Tree tu ) and V tT , u parent the number ofnodes Tree tu parent has.The term before log measures the importance of Tree tu to the whole skeleton tree and the log part describes the uncertainty of Tree tu with respect to its parent sub-tree.Structure entropy of the entire topic, S t , is deﬁned as the sum of node structure entropy: S t = ∑ u ∈ T t , u = root S tu = − ∑ u ∈ T t , u = root g tT , u | E tT | log V tT , u V tT , u parent . 5 Topic knowledge temperature T t is deﬁned as: T t = T tgrowth + T tstructure where T tgrowth measures knowledge increment and T tstructure estimates the degree of latest structural changes in topic’s knowledgeframework. S2.3.1 T tgrowth We initialise T tgrowth by combining the 2 expressions of ideal gas’s internal energy U : U = cnTU = ke Scn V − Rc n R + cc where S is entropy, n is substance amount (number of moles), V is volume, R is ideal gas constant, c is heat capacity and k adjusting coefﬁcient.As a result, T growth writes: T growth = ke S cn (cid:18) n V (cid:19) Rc where S is the initial structure entropy of the topic , n initial topic mass, V initial topic volume, k coefﬁcient to be determinedand R and c two constants.Next, we model G t ’s evolution as an isobaric process of ideal gas. Hence, according to the ideal gas state equation PV = nRT ,by ﬁxing pressure P , T tgrowth is updated by the following expression: T tgrowth = T t − growth n t − n t V t V t − We set topic volume V t to be the node number: V t = | V t | and topic mass n t as n t = | V t | − Use f ulIn f o t . Topic structure entropy S t is derived in the previous subsection, S t = S t . Use f ulIn f o t is based on Di f f Idx in skeleton tree extraction:

Use f ulIn f o t = ∑ ( u , v ) ∈ Tree t Di f f Idx u , v max ( a , b ) ∈ Tree t Di f f Idx a , b Nevertheless, we would like to ﬁnish this part with a qualitative analysis of T tgrowth ’s dynamics from a macroscopic view ofinformation and knowledge. Knowledge originates from information, but information and knowledge have different characteris-tics. Information is only valuable for one time. Duplicate information does not create any additional value, thus cannot be usedto create knowledge. Knowledge is like an understanding and a reﬁnement of information. It is always valuable. Normallyspeaking we cannot have too much knowledge.Bearing the interplay of knowledge and information in mind, we are now ready to interpret the symbolic meaning of volume V t and mass n t . V t represents the total amount of information possessed by a topic at timestamp t . Use f ulIn f o signiﬁes theamount of useful information and thus n t symbolises the total amount of overlapped, or used information. We assume that eachpaper carries one unit of information. Yet we derive useful information edge by edge. This is because in a skeleton tree, allarticles except the pioneering paper only have one citation, and if article u and its ’parent’ (’child’) article have drasticallydifferent Di f f Idx s, they are likely to have distinct research contents. In this case, therefore, even if one of them has completelyoverlapped content with some other article(s) , we can still roughly determine one unit of new information.From the update rule of T tgrowth , we distinguish 3 cases (suppose G t always expands, thus V t always increases):1. T tgrowth will not change if V t and n t have identical increase rate during the last period.6. T tgrowth will decrease if n t increases faster than and V t over the last period.3. T tgrowth will increase if V t increases faster than and n t over the last period. T tgrowth goes up when the quantity of total information grows faster than the amount of duplicate information. Note that V t − n t = Use f ulIn f o t , T tgrowth rises when there is an accelerated increase in useful information. The more abundant usefulinformation is, the bigger possibility for a topic to create new knowledge in the future and the greater potential a topic is.Otherwise, the topic "consumes" information faster than its information capital accumulation. If the tendency continues, itwill have less information reserve for knowledge generation in the future. Its growth potential declines and eventually it ’dies’.Therefore, T tgrowth reﬂects both how smoothly the knowledge accumulation goes and how promising the topic is at timestamp t .As knowledge enrichment eventually brings about scientiﬁc impact, T tgrowth illustrates the long-term cumulative impact of atopic. S2.3.2 T tstructure For a thermodynamic system with freedom to vary its volume, temperature and pressure, the variation in internal energy dU isgiven by dU = T dS − PdV + mdn , where T is the temperature, P the pressure, dV the volume change, m the particle mass and dn the change in the number of particles . The temperature T for an evolving network with ﬁxed node number can be derivedas T = dUdS . It has been proved that with appropriate thermodynamic representations and some approximations, this relationis able to detect the critical events in a dynamic network , .Inspired by the above literature, we deﬁne T tstructure as: T tstructure = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dU t dS t | V t | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U t − U t − S t − S t − | V t | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) where S t − , S t are the von Neumann entropy of G t − and G t and U t − , U t the internal energy. G t is a weighted reducedgraph of G t . It has all the nodes and edges of G t − . Besides, G t contains virtual citations deduced from the new nodes comingbetween timestamp t − t . Intuitively, T tstructure can be interpreted as the average structural change brought byan article in G t .The transformation from G t to G t boils down to 2 tasks: remove new nodes and add virtual citations when possible. The edgeweight of a real citation is 1. For every new node x , we distinguish 2 cases: • If x has only 1 parent node p x , then remove x . If x has child node(s) c x , connect it (them) to x ’s unique parent node andset the edge weight A p x c x = A xc x . Intuitively, since x only cites 1 paper, its arrival cannot give us extra information aboutwhether any of the node pair in G t − that don’t have a citation between them shares some of their research content. • If x has multiple parent nodes, ﬁnd all its "youngest" ancestor nodes in G t − . If a parent node p x is in G t − , then p x isalready a "youngest" ancestor node. Else, iteratively ﬁnd p x ’s predecessors until they are in G t − . Note x ’s youngestancestor nodes in G t − ( a , a , ..., a m ). Next, for each ancestor pair ( a i , a j ) between which there is no edge in G t − , add adirected virtual link according to their publishing year y i , y j (note A the real-time adjacency matrix, m the total number of x ’s youngest ancestor nodes): – If y i < y j , add a directed weighted edge from a i to a j of weight · ∑ px A pxx m ( m − ) . The new edge means " a j virtually cites a i ". – If y i > y j , add a directed weighted edge from a j to a i of weight · ∑ px A pxx m ( m − ) . The new edge means " a j virtually cites a i ". – If y i = y j , add a bidirectional weighted edge between a i and a j of weight ∑ px A pxx m ( m − ) . The new edge means " a j , a i virtually cites each other".Fig. 2(b) illustrates a simple graph shrinking case.In case of a duplicate virtual link, we discard it. In order words, we always keep the ﬁrst virtual link added between a node pair.Remove x after adding all possible virtual links. Intuitively, since x cites several papers, we can guess that these papers aresomehow loosely connected to one another even if there is no direct citations among them. That is why we add virtual citations7f weight less than 1.We set U t − , U t to be the sum of edge weight. As an authentic citation has a weight of 1, U t − reduces to the number of edges U t − = | V t − | . Therefore, if we note A t and E t the adjacency matrix and the edge set of G t respectively, U t − U t − = ∑ ( u , v ) ∈ E t \ E t − A tuv We approximate S t and S t by node degree. The von Neumann entropy for a directed graph is the sum of the von Neumannentropy of its strongly connected (SC) components : S = ∑ SC S SC Now assume the strong connectivity and we extend the entropy computation for unweighted directed graph , to that for aweighted directed graph G = ( V , E ) . First deﬁne some notations:Bidirectional edge set E bd : E bd = { ( u , v ) | ( u , v ) ∈ E and ( v , u ) ∈ E } Adjacency matrix A : A uv = ( w uv if ( u , v ) ∈ E u : d inu = ∑ v ∈ V A vu d outu = ∑ v ∈ V A uv Transition matrix P : P uv = ( A uv d outu if ( u , v ) ∈ E L : ˜ L =  u = v , d outv = − A uv √ d outu √ d outv u = v , ( u , v ) ∈ E λ s normalized Laplacian eigenvalue and φ unique left eigenvector of transition matrix P .The von Neumann entropy of G is the Shannon entropy associated with the normalized Laplacian eigenvalues.By adopting thequadratic approximation to the Shannon entropy (i.e. − x ln x ≈ x ( − x ) ), we have S = − Σ | V | s = ˜ λ s | V | ln ˜ λ s | V | = Σ | V | s = ˜ λ s | V | ( − ˜ λ s | V | )= tr ( ˜ L ) | V | − tr ( ˜ L ) | V | = − tr ( ˜ L ) | V | Now we expand the equation tr ( ˜ L ) = | V | + ( tr ( P ) + tr ( P Φ − P T Φ )) for G : tr ( P ) = Σ u ∈ V Σ v ∈ V P uv P vu = Σ ( u , v ) ∈ E bd A uv A vu d outu d outv r ( P Φ − P T Φ ) = Σ u ∈ V Σ v ∈ V P uv φ ( u ) φ ( v )= Σ ( u , v ) ∈ E φ ( u ) φ ( v ) · A uv d outu Combine the simpliﬁcations together and we have an approximation of G ’s entropy: S = − | V | − | V | (cid:18) Σ ( u , v ) ∈ E bd A uv A vu d outu d outv + Σ ( u , v ) ∈ E φ ( u ) φ ( v ) · A uv d outu (cid:19) Finally, we obtain S t and S t : S t = ∑ SC S tSC = ∑ SC − | V SC | − | V SC | (cid:18) Σ ( u , v ) ∈ E SC , bd d outu d outv + Σ ( u , v ) ∈ E d outu · φ ( u ) φ ( v ) (cid:19) S t = ∑ SC S tSC = ∑ SC − | V SC | − | V SC | Σ ( u , v ) ∈ E SC , bd A SCuv A SCvu d outu d outv + Σ ( u , v ) ∈ E φ ( u ) φ ( v ) · A SCuv d outu ! S2.4 Node Knowledge Temperature

We employ the heat equation to compute node knowledge temperature. For a node u , its temperature change dT tu dt is: dT u dt = | V t | ∑ i = f A tiu ( T i − T u ) where f A tiu is deﬁned as: f A tiu = A tiu · . + Di f f Idx i , u − min ( a , b ) ∈ E t Di f f Idx a , b max ( a , b ) ∈ E t Di f f Idx a , b − min ( a , b ) ∈ E t Di f f Idx a , b ! Di f f Idx is deﬁned previously in subsection topic skeleton tree. f A iu is the thermal conductivity between node i and node u .Before the heat diffusion, we need to ﬁx the temperature of certain nodes and to precise the number of iteration of the heatequation. We assume that the pioneering work is the hottest and all the inactive papers are the coldest. An article u is consideredinactive if either of the following criteria is met:1. u does not have any citation until timestamp t

2. If u joins in the topic before timestamp t − u does not have any new citations between timestamp t − t .We ﬁrst diffuse heat backward by transposing the adjacency matrix e A for 1 iteration, then forward for b avgStep c iterations. avgStep , deﬁned during skeleton tree extraction, can be interpreted as the average hops between 2 random nodes in G t .Backward propagation models the popularity gain in idea thanks to the newcomers and forward propagation models the heatdiffusion due to the inheritance of topic knowledge.We obtain node knowledge temperature ranging from 0 to 1 after applying the heat equation. The last step is to scale nodeknowledge temperature by topic knowledge temperature. Note T tu , std and T tu node u ’s temperatures before and after the scalingand T tstd the average node knowledge temperature before the scaling, we have T tu = T tu , std · T t T tstd Forest helping is designed for a group of similar topics. Through this mechanism, thriving topics "transfuse" a small part oftheir energy to other stagnant sister topics. The helping does not change the total energy of topic group: K ∑ j = cn tj T tj = K ∑ j = cn tj T tj , f orest where K is the number of topics in a group and T tj , f orest the average temperature of topic j after the helping.If all topics in the group are hotter than last period, no helping takes place. Else, all of the topics with a rising knowledgetemperature help the rest.We model the probability that "a thriving topic is willing to help others" follows a beta distribution B ( , ∑ Kj = a j ) , a j beingtopic age. Beta distribution varies from 0 to 1, which corresponds with option "not help" and option "help with all I have".We assume a prosperous topic will give an amount of energy equal to the expectation of the distribution. Hence, at time t , theenergy that a topic gives away is proportional to its own knowledge temperature and is inversely proportional to the ages of theentire group: ∆ E = cn t + ∑ Kj = a tj T t The energy received by each topic in need of help is proportional to its node number. Therefore, they have an identical increasein their knowledge temperatures: δ T = ∆ E ∑ j n tj As topics mature, their initially close connection in thoughts will wear off by time. Consequently, the amount of energytransmitted through forest helping will decrease.

S3 Experiments

We ﬁrst present our results and analysis for individual topic, next discuss the forest helping results for topic group. Note thatmost of the data for 2020 only cover the ﬁrst 2 months, therefore the latest temperature is not deﬁnite. The data in the tables arerounded to 3 decimal places. We set two constants in T tgrowth ’s calculation as R = , c =

1. For topics with more than 5000articles, the coefﬁcient k =

10 in T growth ’s computation. Else, k = S3.1 Rising Topics

S3.1.1 Regulatory T Cells: Mechanisms of Differentiation and Function

The topic has been thriving ever since its birth in 2012 (Fig. S1). It has a very stable annual growth of T t and T tgrowth , whichcorresponds with its seemingly uniform publishing rhythm: an annual publication count always over 10% of the total sizebetween 2013 and 2019. In addition, popular child papers came at a steady speed during 2012 and 2015. They have helpedmaintain a stable knowledge accumulation. T tstructure remains tiny, suggesting that this topic has a gradual knowledge structure progression and has not experienced asudden short-term impact gain. Indeed, although we observe constant visible development in skeleton tree, we don’t see anydisruptive changes in the overall structure (Fig. S2). Under the leadership of several popular child papers, the topic have beensucceeded in developing some sub-directions, as is reﬂected by the fact that multiple non-trivial branches have been gradually10ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S1.

Regulatory T cells: topic statistics and knowledge temperature evolutiongrowing out of the central cluster led by the pioneering work. Yet so far the pioneering paper remains the absolute topic center.Moreover, tiny twigs are forming around the center at a seemingly uniform speed, which may be a good sign for more novelresearch focus. The vigor of skeleton tree shows again the topic’s slowly yet ﬁrmly rising popularity and impact.Now we closely examine its latest skeleton tree (Fig. S3). Almost all the hottest articles surround the pioneering paper andnode knowledge temperature decreases globally as the articles are located farther away from the pioneering paper. Note that theblue nodes that surround the pioneering work are articles with little development within the topic. If we let alone these coldestpapers, the heat distribution ﬁts the general rules "the older the hotter" (Fig. 5(a)) and "the more inﬂuential the hotter" (Fig.S49(a))). Nonetheless, there are exceptions. Age and citations are not guarantee for heat-level. For example, popular childpaper ‘Transcription factor Foxp3 and its protein partners form a complex regulatory network’ is colder than some of its childpapers in the research branch it leads. The intrinsic difference of their research ideas, which is partly reﬂected by the averageheat-level of their citations, causes the temperature difference. Besides, we also identify some young and hot articles. Forexample, 2 papers published in 2017, ‘TNFR2: A Novel Target for Cancer Immunotherapy’ (TNFR2) and ‘Crosstalk betweenRegulatory T Cells and Tumor-Associated Dendritic Cells Negates Anti-tumor Immunity in Pancreatic Cancer’ and 1 paperpublished in Nature Immunology in 2018, ‘c-Maf controls immune responses by regulating disease-speciﬁc gene networksand repressing IL-2 in CD4 + T cells’ all have a knowledge temperature above average. All of them have already inspiredseveral works. Their popularity not only manifests the boosting effect of new articles on original work, but also shows thelasting activity of this topic. Overall, these atypical examples suggest that the positive correlation between node knowledgetemperature and age or pure impact in terms of citation statistics is weak.In particular, we ﬁnd the knowledge temperature evolution of paper ‘Basic principles of tumor-associated regulatory T cellbiology’ (BPTRT), published in 2013 in journal

Trends in Immunology very interesting. This article is the parent paper of‘TNFR2: A Novel Target for Cancer Immunotherapy’ in 2020’s skeleton tree. Its temperature dropped from 213.26 to around170 between 2013 to 2016 despite the fact that it had new followers and that the whole topic went hotter during this period.By the end of the following year, its temperature skyrocketed to around 330. The sudden gain is the result of an accumulatedinﬂuence during period 2013-2016 and the global heat diffusion owing to the topic’s gradual development. Its temperature has11 a) Skeleton tree until 2013 (b)

Skeleton tree until 2015 (c)

Skeleton tree until 2017 (d)

Skeleton tree until 2019

Figure S2.

Regular T Cells: Skeleton tree evolution12itle yearPregnancy imprints regulatory memory that sustains anergy to fetal antigen predictions using deep neuralnetworks 2012Mechanisms of T cell tolerance towards the allogeneic fetus 2013Pregnancy Complications and Unlocking the Enigma of Fetal Tolerance Regulatory T Cells: New Keys forFurther 2014Regulatory T Cells: New Keys for Further Unlocking the Enigma of Fetal Tolerance and PregnancyComplications 2014The immunology of pregnancy: regulatory T cells control maternal immune tolerance toward the fetus 2014Regulatory T Cells: Types, Generation and Function 2014Daughter’s Tolerance of Mom Matters in Mate Choice 2015Regulatory T cells in embryo implantation and the immune response to pregnancy 2018Alloreactive fetal T cells promote uterine contractility in preterm labor via IFN- γ and TNF − α Table S3.

Regular T Cells: Clustering effect example. First line is the parent paper and the rest children.mildly climbed up since 2016, which is in accordance with topic knowledge temperature dynamics. The arrival of its promisingchild, TNFR2. TNFR2 has helped keep BPTRT’s heat-level with its own development. This example well illustrates childarticle’s role in maintaining parent paper’s popularity and impact.We observe in addition certain clustering effect in the skeleton tree. For example, almost all direct children of paper ‘Pregnancyimprints regulatory memory that sustains anergy to fetal antigen’ have similar research themes as itself (Table S3). Thisconﬁrms the effectiveness of our skeleton tree extraction algorithm.

S3.1.2 Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

As is shown by the basic statistics and T t , the topic is keeping popularity and steadily gaining impact (Fig.S4). Its popularchild papers came at a steady speed during 2015 and 2017. Apart from enriching topic knowledge pool with their own ideas,they also attracted new researches’ attention and thus have helped maintain a stable knowledge accumulation. The topic hasbeen accelerating its expansion since 2017. It witnessed the biggest annual publication count in 2019. Yet as most childpapers published no earlier than 2018 have had little development, the publication surge did not result in a signiﬁcant uprise in T t . T tstructure remains tiny compared to T tgrowth , suggesting that the topic has a gradual knowledge structure progression and has notexperienced a sudden short-term impact gain. Indeed, although its skeleton tree has constant visible development (Fig. S5),so far no child paper is able to defy the absolute authority of the pioneering paper, the center of the biggest cluster. Severalpopular child papers have each led a research sub-ﬁeld in the topic, as is depicted by the small bundles extending from thecentral cluster. In particular, popular child paper ‘LSTM: A Search Space Odyssey’ in 2017 has inspired 2 schools of thoughts.The maturation of these newly emerged research directions accounts for a higher T tstructure in the ﬁrst years of the topic. Overall,we observe a universal non-trivial growth in the skeleton tree. The vigor of skeleton tree shows again the slowly yet ﬁrmlyincreasing popularity and impact of this topic.Now we closely examine its latest skeleton tree (Fig. S6). The decrease in node knowledge temperature from root, thepioneering work, to leaves is obvious, which accords with the general rule "the older the hotter" (Fig. 5(b)). Note that theblue nodes that surround the pioneering work and popular child papers are articles with little development within the topic. Inparticular, the heat distribution is rather concentrated in old papers. This phenomenon is in line with our above observationthat young child papers have little authority in the topic. The limited heat diffusion is also why most popular child papershave a node knowledge temperature no greater than average. This topic is quite young. It needs more time to fully explore thepotential of new ideas and to trigger a thorough heat diffusion in its range.In particular, we ﬁnd the knowledge temperature evolution of the second most-cited paper ‘An Empirical Exploration ofRecurrent Network Architectures’, published in 2015 in journal International Conference on Machine Learning very interesting(Fig. S6). This article became much hotter from 2015 to 2016 thanks to its numerous child papers. However, its temperaturereduces by half from 182.578 to 89.19 the next year upon the arrival of the third most-cited paper ‘LSTM: A Search SpaceOdyssey’, the leader of the right major branch in the skeleton tree (Fig. S5 (b,c)). Since then, its temperature has been slightlydecreasing to around 80 in 2020. The sudden drop is a vivid illustration of the rivalry within the topic.13 igure S3.

Regular T Cells: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 55 in-topiccitations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 3 times.14ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S4.

GRU: topic statistics and knowledge temperature evolutiontitle yearMachine Health Monitoring Using Local Feature-Based Gated Recurrent Unit Networks 2018Integrating Convolutional Neural Network and Gated Recurrent Unit for Hyperspectral Image Spectral-Spatial Classiﬁcation 2018Comparison of Deep learning models on time series forecasting : a case study of Dissolved Oxygen Prediction 2019Anomaly Detection of Wind Turbine Generator Based on Temporal Information 2019Energy price prediction based on independent component analysis and gated recurrent unit neural network 2019Condition monitoring of wind turbines based on spatio-temporal fusion of SCADA data by convolutionalneural networks and gated recurrent units 2019Intelligent Fault Diagnosis of Rolling Bearing Using Adaptive Deep Gated Recurrent Unit 2019Abnormality Diagnosis Model for Nuclear Power Plants Using Two-Stage Gated Recurrent Units 2020

Table S4.

GRU: Clustering effect example. First line is the parent paper and the rest children.We observe in addition certain clustering effect in the skeleton tree. For example, almost all direct children of paper ‘MachineHealth Monitoring Using Local Feature-Based Gated Recurrent Unit Networks’ study the industrial applications of gatedrecurrent unit network (Table S4). This illustrates the effectiveness of our skeleton tree extraction algorithm.

S3.1.3 Neural networks for pattern recognition

The topic gained popularity and impact steadily in its ﬁrst 10 years, as is shown by its increasing size and T t (Fig. S7). Duringthis period, inﬂuential child papers within the topic, namely ‘Pattern Recognition and Neural Networks’ (PRNN) published in1996 and ‘A Tutorial on Support Vector Machines for Pattern Recognition’ (SVMPR) published in 1998, shaped the skeletontree altogether with the pioneering work. Their enrichment to topic knowledge structure accounts for a slightly higher T tstructure back then, which is manifested by the formation of 2 clusters in the skeleton tree (Fig. S8). Yet the pioneering work is stillthe absolute authority in the topic. In particular, the cluster in the top is led by PRNN and the top-left small cluster surroundsSVMPR (Fig. S9). Meanwhile, their arrival pushed up the T tgrowth as they also enlarged knowledge base together with commondescendants with the pioneering work. Afterwards, despite a constant increase in total size, topic’s T t increment has sloweddown. The popular child papers coming after 2000, namely ‘Boosting the differences: A fast Bayesian classiﬁer neural network’published in 2000 ,‘A tutorial on support vector regression’ published in 2004 and ‘Data Mining: Concepts and Techniques’published in 2011 have mostly extended the sub-ﬁeld led by SVMPR. Judging from skeleton tree, they have not contributed as15 a) Skeleton tree until 2015 (b)

Skeleton tree until 2016 (c)

Skeleton tree until 2017 (d)

Skeleton tree until 2018

Figure S5.

GRU: Skeleton tree evolution16 igure S6.

GRU: Galaxy map and current skeleton tree. Papers with more than 60 in-topic citations are labelled by title in theskeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 3 times.17uch as their antecedent (Fig. S8). As a result, the topic has been accumulating its knowledge and popularity much slower thanbefore. Nonetheless, globally speaking, this is a rising topic.year | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S7.

Pattern recognition: topic statistics and knowledge temperature evolutionNow we closely examine the interior of this topic. 20 years of development allows a full exploration of the mainstream ideas anda thorough heat diffusion within the topic (Fig. S8). Today, the most popular child papers all have a node knowledge temperatureabove average (Fig. S9) and they serve as heat sources together with the pioneering work. As the articles are located farther awayfrom them, node knowledge temperature decreases globally. Node knowledge temperature also drops evenly with article age(Fig. 5(c)). The drastic heat-level drop in biggest ages is due to the fact that the topic contains several articles published earlierthan the pioneering work and these articles have few followers. Besides, the blue nodes that surround the pioneering work andthe most popular child papers are papers with few or no in-topic citations. However, even if we let alone these oldest articles andthe aforementioned papers with little subsequent development, the general rule "the older the hotter" is not robust. For example,article ‘Data Mining: Practical Machine Learning Tools and Techniques’ (DM) published in 1999 is slightly hotter than itschild papers ‘Discriminative vs. Generative Classiﬁers: An In-Depth Experimental Comparison using Cost Curves’ (DGC)published in 2005 and ‘Feature selection and classiﬁcation in multiple class datasets’ (FSC) published in 2011. DM is colouredorange while DGC and FSC are coloured orange-red. This is due to the intrinsic difference of their content, which is reﬂectedby their distinct citations. This example also suggests that the general rule "the more inﬂuential the hotter" is weak (Fig. S49 (c)).We observe in addition certain clustering effect in the skeleton tree. For example, all child papers of ‘Selection of inputparameters to model direct solar irradiance by using artiﬁcial neural networks’ study the topic’s application in energy radiation(Table S5). This conﬁrms the effectiveness of our skeleton tree extraction algorithm.18 a) Skeleton tree until 1998 (b)

Skeleton tree until 2001 (c)

Skeleton tree until 2004 (d)

Skeleton tree until 2007 (e)

Skeleton tree until 2010 (f)

Skeleton tree until 2016

Figure S8.

Pattern recognition: Skeleton tree evolutiontitle yearSelection of input parameters to model direct solar irradiance by using artiﬁcial neural networks 2004Estimation of Surface Solar Radiation with Artiﬁcial Neural Networks 2008Improvement of temperature-based ANN models for solar radiation estimation through exogenous dataassistance 2011Splitting Global Solar Radiation into Diffuse and Direct Normal Fractions Using Artiﬁcial Neural Networks 2012Prediction of daily global solar irradiation data using Bayesian neural network: A comparative study 2012Assessment of ANN and SVM models for estimating normal direct irradiation (Hb) 2016

Table S5.

Pattern recognition: Clustering effect example. First line is the parent paper and the rest children.19 igure S9.

Pattern recognition: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 230 in-topiccitations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 5 times.20

S3.2.1 Critical Power for Asymptotic Connectivity in Wireless Networks

As is shown by the basic statistics and T tgrowth , the topic reached its peak around 2011 (Fig. S10). The decline in scale growthand T tgrowth is obvious afterwards. The majority of popular child papers were published no later than 2004. They pushed up T tgrowth with their new ideas and contributed to the ﬂourishing before 2010. In particular, popular child papers ‘The capacity ofwireless networks’ published in 2000 and ‘The number of neighbors needed for connectivity of wireless networks’ published in2004 each leads a non-trivial research sub-direction, demonstrated as clusters in the skeleton tree (Fig. S12). Their substantialextension to the topic knowledge structure is additionally illustrated by a high T tstructure in the early days. However, the glorydid not last for long. After 2010, the continuous lack of young inﬂuential child papers gradually resulted in a decreasing topicvisibility and thus a shrinking inﬂow of useful information, its knowledge source. The trend is also reﬂected in the stagnation ofskeleton tree. While we are still able to detect some development on the periphery of all 3 clusters from 2007 to 2011, theskeleton tree seems to take a deﬁnitive form after 2011. The snapshots look almost identical (Fig. S11). Consequently, both T tgrowth and T tstructure have plunged. After 10 years of golden age, the topic is now perishing.year | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S10.

Critical Power: topic statistics and knowledge temperature evolutionNow we closely examine the heat distribution within the topic (Fig. S12). We observe a quick heat diffusion during theﬂourishing period (Fig. S11(b,c)). Now heat diffusion is complete as popular child papers all have a knowledge temperatureabove average and the child papers published during the golden period are relatively hot in general (Fig. 5(d)). An obviousexception lies in the oldest child papers. Their low average temperature is because they were published at the same time orearlier than the pioneering work and they have few or no followers. Besides the pioneering work, popular child paper ‘Thecapacity of wireless networks’ is also a heat source within the topic. As articles are located farther away from them, theygradually cool down. The blue nodes that surround the pioneering work and the popular child paper ‘The capacity of wirelessnetworks’ in central clusters are papers with few or no in-topic followers. However, the general rules "the older the hotter" and"the more inﬂuential the hotter" (Fig. S49(d)) are not robust. For instance, paper ‘New perspective on sampling-based motionplanning via random geometric graphs’ (SBMP) published in 2018 is hotter than its parent, ‘CONNECTIVITY OF SOFTRANDOM GEOMETRIC GRAPHS’ (CSRG), an article published in 2016. SBMP has an average knowledge temperaturewhile CSRG has a temperature below average. This can be mainly attributed to their different research focus, which is reﬂectedby their distinct citations and citations’ average heat-level. Another reason may be that even though CSRG has had a muchbetter development, the dozen articles it has inspired have gained little popularity and impact, thus they do not help boostCSRG’s status.We ﬁnd article ‘Power Control in Ad-Hoc Networks: Theory, Architecture, Algorithm and Implementation of the COMPOWProtocol’ particularly interesting. It is not a cluster center, nor does it have many articles around, yet it has a big structure21 a) Skeleton tree until 2003 (b)

Skeleton tree until 2007 (c)

Skeleton tree until 2011 (d)

Skeleton tree until 2015

Figure S11.

Critical Power: Skeleton tree evolution22itle yearCONNECTIVITY OF SOFT RANDOM GEOMETRIC GRAPHS 2016Isolation and Connectivity in Random Geometric Graphs with Self-similar Intensity Measures 2018On Resilience and Connectivity of Secure Wireless Sensor Networks Under Node Capture Attacks 2017New perspective on sampling-based motion planning via random geometric graphs 2018

Table S6.

Critical Power: Clustering effect example. First line is the parent paper and the rest children.entropy and a highest knowledge temperature. We think this is due to its strategic position, right between 2 clusters respectivelyled by ‘The capacity of wireless networks’ and ‘The number of neighbors needed for connectivity of wireless networks’. Thearticle itself may not have a big impact, but it has inspired a handful of inﬂuential literature. Its value lies in enlightenment.We observe in addition certain clustering effect in the skeleton tree (Table S6). For example, almost all child papers of‘CONNECTIVITY OF SOFT RANDOM GEOMETRIC GRAPHS’ have similar research themes as itself. This conﬁrms theeffectiveness of our skeleton tree extraction algorithm.

S3.2.2 The capacity of wireless networks

As is shown by T t , the topic reached its peak at some time around 2007 (Fig. S13). The batch of popular child papers arrivingbetween 2001 and 2004, namely ‘Capacity of Ad hoc wireless networks’, ‘Mobility increases the capacity of ad-hoc wirelessnetworks’, ‘A network information theory for wireless communication: scaling laws and optimal operation’ and ‘Impact ofinterference on multi-hop wireless network performance’, largely enriched the topic knowledge base by inspiring severalresearch sub-ﬁelds, as is reﬂected by the signiﬁcant structure advancement in skeleton tree from 2003 to 2007 (Fig. S14). As aresult, we observe a soar both in T tgrowth and T tstructure . Popular child papers continued to come until 2007. But the younger onesdid not cause a stir as much. Only 1 of them has made visible contribution to knowledge structure evolution: ‘Closing theGap in the Capacity of Wireless Networks Via Percolation Theory’ published in 2007 opened up a new research focus and ledto the end division of a major branch in the skeleton tree by 2011. The decreasing exposure gained by its child papers and adecelerating evolution in knowledge pattern caused T tstructure to drop after 2007. But the residual attractiveness continued todraw a abundant quantity of "new blood" and ensured the rise in T tgrowth for a while longer. After 2011, despite a continuoussize expansion and a steady knowledge accumulation, the topic has been gradually phased out due to an overall mediocredevelopment of child papers published after 2009. The wear-off of the community’s focus is illustrated by an immediate dropin T tstructure in 2015, which also accounts for the down trend of T t . Correspondingly, we observe fewer remarkable changes inskeleton tree during this period. While the cooling-down is mainly due to attention loss before 2015, recent temperature dropis caused by knowledge supply shortage. The focus loss has eventually resulted in diminishing publications and affected itslong-term knowledge accumulation. To sum up, after around 10 years of glory, the topic is now going downhill.Now we probe into the topic and closely examine the heat distribution in its latest skeleton tree (Fig. S15). After 20 years ofdevelopment, the heat diffusion is nearly completed as popular child papers all have a knowledge temperature above averageand the child papers published in the ﬁrst 10 years are relatively hot in general (Fig. 5(e)). The popular child papers and thepioneering work are the multiple heat sources within the topic. If we let alone the blue nodes surrounding the pioneeringwork and popular child papers, which are papers with few or without any in-topic citations, it is clear that node knowledgetemperature decreases globally as the articles are located farther away from them. However, there are exceptions to generalrules "the more inﬂuential the hotter" (Fig. S49(e)) and "the older the hotter". For example, paper ‘Mobility increases thecapacity of ad-hoc wireless networks’ (MAWN) published in 2001, which is at the junction between the central cluster and aprincipal branch, is slightly colder than 2 of its children: ‘Design challenges for energy-constrained ad hoc wireless networks’(DCAWN) published in 2002 and ‘Unreliable sensor grids: coverage, connectivity and diameter’ (USG) published in 2003.MAWN is coloured orange while DCAWN and USG are coloured orange-red and red. The main reason of this uncommonphenomenon is their different research focus, which is reﬂected by their distinct citations and citations’ average heat-level.Another reason may be that even though MAWN has inspired much more child papers, few of its numerous followers have sofar achieved remarkable development, hence their limited boosting effect.We observe in addition certain clustering effect in the skeleton tree (Table S7). For example, almost all child papers of ‘ADelay-Efﬁcient Algorithm for Data Aggregation in Multihop Wireless Sensor Networks’ have similar research themes as itself.This proves the effectiveness of our skeleton tree extraction algorithm.23 igure S12. Critical Power: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 100 in-topiccitations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 3 times.24ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S13.

Capacity Wireless Network: topic statistics and knowledge temperature evolutiontitle yearA Delay-Efﬁcient Algorithm for Data Aggregation in Multihop Wireless Sensor Networks 2011In-Network Estimation with Delay Constraints in Wireless Sensor Networks 2013Estimate Aggregation with Delay Constraints in Multihop Wireless Sensor Networks 2011Genetic Local Search for Conﬂict-Free Minimum-Latency Aggregation Scheduling in Wireless SensorNetworks 2018Interference-Fault Free Data Aggregation in Tree-Based WSNs 2016GLS and VNS Based Heuristics for Conﬂict-Free Minimum-Latency Aggregation Scheduling in WSN. 2019Data Aggregation Scheduling Algorithms in Wireless Sensor Networks: Solutions and Challenges 2014Efﬁcient scheduling for periodic aggregation queries in multihop sensor networks 2012Layer-Based Data Aggregation and Performance Analysis in Wireless Sensor Networks 2013Neither Shortest Path Nor Dominating Set: Aggregation Scheduling by Greedy Growing Tree in MultihopWireless Sensor Networks 2011Composite interference mapping model for Interference Fault-Free Transmission in WSN 2015Weighted fairness guaranteed data aggregation scheduling algorithm in wireless sensor networks 2012A fuzzy-rule-based packet reproduction routing for sensor networks 2018

Table S7.

Capacity Wireless Network: Clustering effect example. First line is the parent paper and the rest children.25 a) Skeleton tree until 2003 (b)

Skeleton tree until 2007 (c)

Skeleton tree until 2011 (d)

Skeleton tree until 2015

Figure S14.

Capacity Wireless Network: Skeleton tree evolution26 igure S15.

Capacity Wireless Network: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 500in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁedby 3 times. 27

The popularity and impact gain in the ﬁrst years is mainly due to a fast accumulation of useful information. By the end of 2013,2 inﬂuential child papers, ‘Linguistic Regularities in Continuous Space Word Representations’ (LRCSWR) and ‘DistributedRepresentations of Words and Phrases and their Compositionality’ (DRWPC) had formed the fundamentals of topic knowledgestructure. LRCSWR is the red node in the middle of the then skeleton tree and its child, DRWPC, is represented by theyellow-green node above itself (Fig. S17(a)). During the next 2 years, the topic expanded quickly thanks to the substantialdevelopment of all 3 papers. DRWPC emerged as the second topic center following the pioneering work (Fig. S17(b)). Inaddition, DRWPC helped extending topic knowledge structure by inspiring a new research direction. This research branchlater proved to be a novel research focus. Starting from 2016, owing to a multidimensional development the topic has beenmaintaining a knowledge reserve quantity corresponding to its size, which is reﬂected by its steady T tgrowth (Fig. S16). Moreimportantly, the research branch that emerged by the end of 2015 has developed into 2 new non-trivial research directionsdue to the popularity rise in 2 child papers published in 2014: ‘Glove: Global Vectors for Word Representation’ (Glove) and‘Distributed Representations of Sentences and Documents’ (DRSD). They brought new knowledge, attracted the attention ofthe latest research attention, and catalysed an accelerated topic knowledge structure evolution, which is captured by a rising T tstructure . This year, there has not been any signiﬁcant new trend so far. Therefore, the topic cools down a bit due to a T tstructure drop. Unless the topic succeeds in "breeding" some new focus or having some breakthrough to existing sub-topics in the nearfuture, it starts to go downhill after 6 years of thriving.year | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S16.

Efﬁcient word representation: topic statistics and knowledge temperature evolutionNow we probe into the topic and closely examine the heat distribution in its latest skeleton tree (Fig. S18). The topic’s fastdevelopment accompanies a continuous heat diffusion. The older popular child papers has become the hottest since 2015 andthe younger ones, namely DRSD and Glove, has recently evolved into topic’s new heat sources. It is clear that node knowledgetemperature decreases globally as the articles are located farther away from them. This phenomenon ﬁts the general rule "theolder the hotter" (Fig. 5(f)) and "the more inﬂuential the hotter" (Fig. S49(f)). Note that the blue nodes that surround thepioneering work and popular child papers in central parts are papers with few or without any in-topic citations.We observe in addition certain clustering effect in the skeleton tree (Table S8). For example, in current skeleton tree, allchild papers of ‘Sentiment Embeddings with Applications to Sentiment Analysis’ published in journal

IEEE Transactions onKnowledge and Data Engineering in 2016 specialize in sentiment analysis. This proves the effectiveness of our skeleton treeextraction algorithm. 28 a) Skeleton tree until 2013 (b)

Skeleton tree until 2015 (c)

Skeleton tree until 2017 (d)

Skeleton tree until 2019

Figure S17.

Efﬁcient word representation: Skeleton tree evolution29 igure S18.

Efﬁcient word representation: Galaxy map and current skeleton tree. Papers with more than 700 in-topic citationsare labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 3 times.30itle yearSentiment Embeddings with Applications to Sentiment Analysis 2016Deep Learning Adaptation with Word Embeddings for Sentiment Analysis on Online Course Reviews 2020Learning Word Representations for Sentiment Analysis 2017Improving Aspect-Based Sentiment Analysis via Aligning Aspect Embedding 2019Attention-based long short-term memory network using sentiment lexicon embedding for aspect-levelsentiment analysis in Korean 2019Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review 2019An efﬁcient preprocessing method for supervised sentiment analysis by converting sentences to numericalvectors: a twitter case study 2019Deep learning for sentiment analysis: A survey 2018Deep Learning in Sentiment Analysis 2018Sentiment analysis using deep learning approaches: an overview 2020

Table S8.

Efﬁcient word representation: Clustering effect example. First line is the parent paper and the rest children.

S3.2.4 Coverage problems in wireless ad-hoc sensor networks

This topic reached its peak around 2010 thanks to a surge in T tstructure . Most of its popular child papers were published bythe end of 2006. Among them, the older ones laid the foundation of multiple research sub-directions and the younger onesfurther developed these new research branches. For instance, papers ‘Unreliable sensor grids: coverage, connectivity anddiameter’ and ‘Sensor placement for grid coverage under imprecise detections’ published in 2002 and 2003 extended primarilythe idea of the pioneering work. They formed the 2 big branches surrounding the central cluster in skeleton tree by 2007 (Fig.S20(a), S21). Paper ‘The coverage problem in a wireless sensor network’ (CPWS) published in 2005, however, created asecond smaller cluster by furthering the study of his predecessor ‘Localized algorithms in wireless ad-hoc networks: locationdiscovery and sensor exposure’ (LAWAN) published in 2001. Other popular papers published between 2005 and 2006 weresplit into 2 parties, one group supporting the growth in central cluster led by the pioneering work, the other group enrichingthe newer cluster built essentially by CPWS. As a result, we observe non-trivial growth in every corner of the skeleton treeduring 2007 and 2010 (Fig. S20(b)). Nonetheless, along with the multidimensional ﬂourishing, the knowledge structure startedits gravity redistribution due to the maturation of the research sub-directions. This silent transformation is captured by thehigh T tstructure around 2010. The aforementioned popular child papers as well as their inspirations for future works also makegreat contributions to the knowledge accumulation. They helped push up T tgrowth until 2010. Afterwards, the topic experiencedﬁrst an absence of promising child papers and then a decline in useful information supply due to its decelerated expansion.Consequently, T tgrowth has stagnated. The skeleton tree has unsurprisingly lost its vigor during this period (Fig. S20 (c,d)).To sum up, this topic, after a rapid development in its early days, demonstrates now a decreasing activity and a diminishingpopularity and impact.The topic’s skeleton tree is a bit special in that it is comprised of 2 parts. The separation is due to the isolation of LAWAN fromthe pioneering work. LAWAN cites both the pioneering work and ‘Dynamic ﬁne-grained localization in Ad-Hoc networks ofsensors’ (DLANS). Because of a closer relation between LAWAN and DLANS, its connection to the pioneering work is cut offin skeleton tree extraction. A similar reason caused the separation of DLANS and the pioneering work. LAWAN, along withseveral intimately related papers, is thus completely separated from the pioneering work. They form a mini bundle beside thecentral cluster in 2004 skeleton tree. Shortly after, the arrival of popular child paper, CPWS, largely developed this tiny bundleand turned it into the big aggregation under the central cluster (Fig. S20).Now we closely examine the heat distribution within the topic (Fig. S21). After 19 years of development, the heat diffusion isnearly completed as most popular child papers have a knowledge temperature above average and the child papers publishedduring the ﬂourishing period are relatively hot in general (Fig. 5(g)). Half of the most popular child papers serve as heat sourcesand node knowledge temperature decreases globally as the articles are located farther away from them. This corresponds withthe general rule "the older the hotter". Yet as several papers published at the same time as the pioneering work either have hadfew development or have not been cited by any recent works, they are the coldest and thus bring down the average knowledgeof the oldest articles. In addition, the blue nodes that surround the pioneering work and popular child papers are papers withfew or without any in-topic followers. However, we still ﬁnd exceptions even if we let alone the oldest papers. Paper ‘Minimaland maximal exposure path algorithms for wireless embedded sensor networks’ (MMEPA) published in 2003 is colder than, forinstance, its child ‘Smart Path-Finding with Local Information in a Sensory Field’ published in 2006 and ‘An Algorithm forTarget Traversing Based on Local Voronoi Diagram’ published in 2007. These 2 child papers are represented as orange nodes31ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t S3.2.5 A neural probabilistic language model

Unlike many topics that welcome the majority of their popular child papers shortly after their birth, this topic waited for a longtime. Most of its prominent child papers came during 2010 and 2014. Their arrival opened up new research sub-ﬁelds (Fig.S24) and infused much vigor and new knowledge to the topic, which strongly boosted T tgrowth during 2011 and 2015 (Fig. S22).Although the topic continued to grow fast after 2015, few child papers stood out and none has created new research focus so far.As a result, the knowledge accumulation process is affected by the overall quality slump and the topic started to cool downowing to the lack of new outstanding ideas. In terms of knowledge structure evolution, the topic manifests a smooth and steadyprogress (Fig. S23). Since the arrival of popular child papers is quite evenly spanned over 2010 and 2014, their contribution tothe thriving is more reﬂected as knowledge and impact accumulation than a short-term popularity gain. To conclude, after arecent boom thanks to its popular child papers, the topic is now going downhill.The skeleton tree is a bit special because it is made up of 2 parts. This is due to the separation of paper ‘Connectionist languagemodeling for large vocabulary continuous speech recognition’ (CLM) from the pioneering work, the only citation CLM haswithin the topic. In fact, CLM was published a bit earlier than the pioneering work, therefore its relation with the pioneeringwork may not be tight. This results in the edge cutting during skeleton tree extraction. CLM later inspired ‘Efﬁcient trainingof large neural networks for language modeling’, whose work turned out to have a greater inﬂuence on the aforementionedpopular child papers than that of the pioneering work. That is why skeleton tree ﬁnally takes a separated form.Now we closely examine the current heat distribution with its latest skeleton tree (Fig. S24). The pioneering work remains theonly heat source in the topic and almost all of the most popular child papers have a knowledge temperature below average.Although they have indeed vitalized the topic, more importantly they themselves have proposed novel ideas that made themovershadow the pioneering work and become the new authorities in the domain (Fig. S24 galaxy map). The relatively looseconnection to the core topic idea has resulted in their low knowledge temperature. Their "coolness" is also the reason thatthe cluster they are in is much colder than the one led by the pioneering work. Overall, we observe the general rule "theolder the hotter" (Fig. 5(h)). The blue nodes that surround the pioneering work and popular child papers are papers with32 a) Skeleton tree until 2007 (b)

Skeleton tree until 2010 (c)

Skeleton tree until 2013 (d)

Skeleton tree until 2016

Figure S20.

Coverage problems: Skeleton tree evolution33 igure S21.

Coverage problems: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 150 in-topiccitations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 3 times.34ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S22.

Neural language model: topic statistics and knowledge temperature evolutiontitle yearRoad2Vec: Measuring Trafﬁc Interactions in Urban Road System from Massive Travel Routes 2017Knowledge Embedding with Geospatial Distance Restriction for Geographic Knowledge Graph Completion 2019A regionalization method for clustering and partitioning based on trajectories from NLP perspective 2019From Motion Activity to Geo-Embeddings: Generating and Exploring Vector Representations of Locations,Traces and Visitors through Large-Scale Mobility Data 2019Detecting geo-relation phrases from web texts for triplet extraction of geographic knowledge: a context-enhanced method 2019

Table S9.

Neural language model: Clustering effect example. First line is the parent paper and the rest children.few or without any in-topic citations. Node knowledge temperature decrease is clear as we walk down the paths in skeletontree. However, there are exceptions. Hit paper ‘A uniﬁed architecture for natural language processing: deep neural networkswith multitask learning’ (UANLP) published in 2008 is colder than, for instance, its well-developed child ‘Large ScaleDistributed Deep Networks’ published in 2012 and ‘Parsing Natural Scenes and Natural Language with Recursive NeuralNetworks’ published in 2011. These 2 child papers are represented as orange nodes yet the UANLP is a yellow node. Theirtemperature difference lies mainly in their research focus reﬂected by their citation patterns. Although these 2 child papersboth have a few followers in the latest skeleton tree, they are still less popular than their parent in terms of idea diffusion.This counter example also illustrates that the general rules "the more inﬂuential the hotter" is very weak in the topic (Fig. S49(h)).We observe in addition certain clustering effect in the skeleton tree (Table S9). For example, all child papers of ‘Road2Vec:Measuring Trafﬁc Interactions in Urban Road System from Massive Travel Routes’ have a research interest related to geographicrelation. This conﬁrms the effectiveness of our skeleton tree extraction algorithm. In addition, this small bundle is very younger,hence their research interest may be among the latest trends. 35 a) Skeleton tree until 2009 (b)

Skeleton tree until 2011 (c)

Skeleton tree until 2013 (d)

Skeleton tree until 2015

Figure S23.

Neural language model: Skeleton tree evolution36 igure S24.

Neural language model: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 600in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁedby 3 times. 37

As is shown by T tgrowth and T t , the topic continuously gained fame between 2009 and 2015 (Fig. S25). Almost all of its mostinﬂuential child papers were published during this period. After that, despite a steady size growth, the topic has graduallycooled down. This is because the majority of prominent child papers, namely ‘Efﬁcient Estimation of Word Representationsin Vector Space’ (EEWRVS), ‘Distributed Representations of Words and Phrases and their Compositionality’ (DRWPC) and‘Word Representations: A Simple and General Method for Semi-Supervised Learning’ (WRSSL), were published no later than2013. They brought large amounts of new knowledge and, more importantly, attracted much immediate attention after theirpublication. By the end of 2015, these child papers, having collected a fair share of in-topic citations, had already becomecrucial members of the topic. Together with the pioneering work, they shaped topic knowledge (Fig. S26(d)). Child paperspublished no earlier than 2016 enriched the ideas proposed by the aforementioned popular child papers (Fig. S26(e,f)). Veryfew have had a signiﬁcant subsequent development even though the topic has succeeded in attracting a stable stream of recentattention. Therefore, the enrichment of knowledge base has slowed down and thus the knowledge temperature has slightlydropped. To sum up, the topic demonstrates a rise-then-fall dynamics.The skeleton tree of this topic manifests a gradual structural advancement in line with a constantly small T tstructure (Fig. S26).Its popular child papers have unanimously dedicated themselves to one single research sub-direction, which is portrayed by thesteadily-growing big branch (Fig. S27).year | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S25.

A uniﬁed architecture for NLP: topic statistics and knowledge temperature evolutionNow we closely examine the internal heat distribution and its latest skeleton tree (Fig. S27). The pioneering work is theonly heat source. Interestingly, half of the most popular child papers have a knowledge temperature below average. In fact,they all cited another popular child paper, WRSSL. In terms of idea inheritance, they are less close to the pioneering workthan WRSSL. A bigger portion of original idea has caused their relatively low knowledge temperature. We see a clear nodeknowledge temperature decline from the root to leaves. This corresponds with the general rule "the older the hotter" (Fig. 5(i)).As the topic contains 2 articles published earlier than the pioneering work and they have few in-topic citations, the average nodeknowledge temperature for the oldest papers is not maximal. In addition, the blue nodes that surround the pioneering work andthe most popular child papers are papers with few or without any in-topic citations. However, even if we set aside the oldest38 a) Skeleton tree until 2009 (b)

Skeleton tree until 2011 (c)

Skeleton tree until 2013 (d)

Skeleton tree until 2015 (e)

Skeleton tree until 2017 (f)

Skeleton tree until 2019

Figure S26.

A uniﬁed architecture for NLP: Skeleton tree evolution39 igure S27.

A uniﬁed architecture for NLP: Galaxy map, current skeleton tree and its regional zoom. Papers with more than400 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size isampliﬁed by 3 times. 40itle yearThroughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks 2016Automatic code generation of convolutional neural networks in FPGA implementation 2016Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks 2017Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer 2017Towards Efﬁcient Hardware Acceleration of Deep Neural Networks on FPGA 2018UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs 2018

Table S10.

A uniﬁed architecture for NLP: Clustering effect example. First line is the parent paper and the rest children.papers and the aforementioned coldest papers, the general rule is violated. Hit paper ‘Learning Deep Architectures for AI’(LDAAI) published in 2009 is colder than, for instance, its child papers ‘3D Convolutional Neural Networks for Human ActionRecognition’ published in 2013 and ‘Learning structured embeddings of knowledge bases’ published in 2011. These 2 childpapers are represented as orange nodes yet LDAAI is coloured yellow. This is mainly due to their relatively different researchfocus as their in-topic citations do not overlap with one another. Similarly, popular child paper EEWRVS is slightly colder thanits descendant, DRWPC. These counter examples also illustrate that the general rule "the more inﬂuential the hotter" is veryweak in this topic (Fig. S49(i)).We observe in addition certain clustering effect in the skeleton tree (Table S10). For example, all child papers of ‘Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks’ have a research interest towardsaccelerator. This conﬁrms the effectiveness of our skeleton tree extraction algorithm.

S3.2.7 Bose-Einstein condensation in a gas of sodium atoms

Founded in 1995, this topic thrived for some 20 years before starting to stagnate since 2013 (Fig. S28). While most of thehighest-cited child papers within the topic came between 1997 and 2003, several came after 2006, namely ‘Bose-Einsteincondensation of exciton polaritons’ (BECEP) published in 2006 in Nature, ‘Production of Cold Molecules via MagneticallyTunable Feshbach Resonances’ published in 2006 in Reviews of Modern Physics, and ’Bose-Einstein condensation of photonsin an optical microcavity’ (BECPOM) published in 2010 in Nature. The relay among these popular child papers maintainedthe topic’s ﬂourishing for 20 years. In addition, the topic was most proliﬁc between 2010 and 2012, with annual publicationnumber all exceeding 5% of current topic size. The increasing inﬂow of knowledge, together with the exposure brought by theaforementioned popular child papers, contributed to a slightly bigger climb in T t and T tgrowth between 2011 and 2013. Afterthat, the topic has not so far welcomed any superstars that have incited remarkable development. Yet it still has a rather stableknowledge accumulation judging from basic statistics. Hence overall T tgrowth ceased to go up and so is T t . T tstructure is higher in early days, which corresponds with a multi-dimensional growth in skeleton tree thanks to inﬂuential childpapers published around 2000 (Fig. S29). After 2013, skeleton tree has ﬁxed its structure. We observe few visible changes inskeleton tree, namely some development in the research direction jointly led by popular child papers BECEP and BECPOMand a new small research branch deriving from the school of thought led by child papers ‘Second-Order Corrections to MeanField Evolution of Weakly Interacting Bosons. I.’ published in 2010 and its rather successful descendant ‘Derivation of theCubic NLS and Gross-Pitaevskii Hierarchy from Manybody Dynamics in d = 3 Based on Spacetime Norms’ published in 2014.Now we closely examine its internal heat distribution together with its latest skeleton tree (Fig. S30). After more than 20 yearsof development, the heat has fully propagated to recent research directions led by popular child papers. Popular child papers areamong the hottest articles and the child papers published during the ﬂourishing period are relatively hot in general (Fig. 5(j)).The knowledge temperature decrease from cores to ends is clear. This corresponds with the general rule "the older the hotter".The blue nodes that surround the pioneering work and popular child papers in main clusters are papers with few or without anyin-topic citations. However, there are exceptions. Paper ‘A gapless theory of Bose-Einstein condensation in dilute gases at ﬁnitetemperature’ published in 2000 is colder than its child paper ‘Theory of the weakly interacting Bose gas’ (TWIBS) published in2004. TWIBS is also slightly colder than its direct child paper in current skeleton tree ‘Weakly-Interacting Bosons in a Trapwithin Approximate Second Quantization Approach’ (WIBTASQ) published in 2007. This is mainly due to their relativelydifferent research focus as most of their in-topic citations do not overlap with one another. As WIBTASQ is the least developedamong the three in terms of citations, this counter examples also illustrates that the general rule "the more inﬂuential the hotter"is weak (Fig. S49(j)). 41ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S28.

Bose-Einstein condensation: topic statistics and knowledge temperature evolution42 a) Skeleton tree until 1997 (b)

Skeleton tree until 2001 (c)

Skeleton tree until 2005 (d)

Skeleton tree until 2009 (e)

Skeleton tree until 2013 (f)

Skeleton tree until 2017

Figure S29.

Bose-Einstein condensation: Skeleton tree evolution43 igure S30.

Bose-Einstein condensation: Galaxy map, current skeleton tree and its regional zoom. Papers with more than150 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size isampliﬁed by 3 times. 44itle yearComparative analysis of electric ﬁeld inﬂuence on the quantum wells with different boundary conditions: II.Thermodynamic properties 2015Theory of the Robin quantum wall in a linear potential. II. Thermodynamic properties 2016Comparative analysis of electric ﬁeld inﬂuence on the quantum wells with different boundary conditions.: I.Energy spectrum, quantum information entropy and polarization 2015Thermodynamic Properties of the 1D Robin Quantum Well 2018

Table S11.

Bose-Einstein condensation: Clustering effect example. First line is the parent paper and the rest children.We ﬁnd the knowledge temperature evolution of child paper BECEP particularly interesting. Despite topic’s stagnation startingfrom around 2013 and 2014, its knowledge temperature has been constantly on the rise since its publication, from 60.4 in 2006to 83.5 in 2020. Its rising temperature demonstrates its above-average recent development compared to the entire topic.We observe in addition certain clustering effect in the skeleton tree (Table S11). For example, all child papers of ‘Comparativeanalysis of electric ﬁeld inﬂuence on the quantum wells with different boundary conditions: II. Thermodynamic properties’have a research interest towards thermodynamics. This conﬁrms the effectiveness of our skeleton tree extraction algorithm.

S3.3 Awakened topics

S3.3.1 Long short-term memory

After a boom right after its birth, the topic hibernated for as long as 10 years before having an explosive growth. As is shown bythe basic statistics, the topic’s expansion in the ﬁrst 15 years is much slower than recently. Apart from publication quantitydifference, we also observe an obvious discrepancy in article’s contribution to topic’s ﬂourishing. Few child papers turned outto be popular among topic members. Child paper ‘Learning to Forget: Continual Prediction with LSTM’ (LFCP) publishedin 2000 is the only superstar the topic had for a long time. It successfully extended the pioneering work’s idea and foundeda new research focus, represented by the branch pointing to the bottom-left in skeleton tree (Fig. S32(b,c,d)). Althoughthe research branch seemed small by 2001, it already meant something compared to the then topic size. The evolution inknowledge structure led to a high T tstructure . The remaining popular child papers, namely 2 published in 2003, ‘Kalman ﬁltersimprove LSTM network performance in problems unsolvable by traditional recurrent nets’ and ‘Learning precise timing withlstm recurrent networks’, arriving later unanimously focused on LFCP’s idea. Together they contributed to the maturation ofthis new sub-ﬁeld and maintained partly the heat-level of the entire topic. The situation changed after 2010. The artiﬁcialintelligence frenzy pulled the topic under the spotlight. Thanks to the favorable background, the topic welcomed numerouspopular child papers during 2013 and 2016, for instance, ‘Sequence to Sequence Learning with Neural Networks’ (S2SNN),‘Neural Machine Translation by Jointly Learning to Align and Translate’ (NMTAT) and ‘Deep Residual Learning for ImageRecognition’ (DRLIR). While inheriting the essence of LFCP, they brought alone considerable amount of new knowledge,introduced new sub-topics and produced the renaissance of this old topic (Fig. S33, S32(d,e,f)). Consequently, we see a slightlyhigher T tstructure around 2015 owing to the knowledge structure enrichment and a soar in T t starting from 2017. The longinterval between the birth and the peak of impact and popularity makes us deﬁne this research ﬁeld as an awakened topic.There is a tiny cluster isolated from the majority of the skeleton tree (Fig. S33 in the top-middle of current skeleton tree). Thisis because the topic contains several child papers published at the same time or evenly a bit earlier than the pioneering work.Comparatively speaking, their work is not very intimately related to that of the pioneering article. Therefore, altogether withsome of their closest descendants, they were disconnected from the pioneering work during the skeleton tree construction.Now we examine the heat distribution within the topic (Fig. S33). The pioneering work remains the only heat source sofar. Although this topic has a long history, its ﬂourishing took place a few years ago. It needs more time to have a thoroughheat diffusion within the topic. That is why most popular child papers have a node knowledge temperature around or abit above average. At present, most of the hottest articles are located around the pioneering work the central cluster. Theknowledge temperature decline from the core to ends is obvious. This corresponds with the general rule "the older the hotter"(Fig. 5(k)). Note that the blue nodes surrounding the pioneering work and popular child papers in non-trivial clusters arepapers with few or without any in-topic citations. The low average temperature for the oldest papers is due to their looseconnection to the topic majority as they were published no later than the pioneering work and have had few child paperswithin the topic. However, even if we let alone these papers, age is not guarantee of a bigger impact and popularity. Forinstance, 2 popular children papers of LFCP are slightly hotter than itself. They are ‘Kalman ﬁlters improve LSTM network45ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S31.

Long short-term memory: topic statistics and knowledge temperature evolution46 a) Skeleton tree until 1999 (b)

Skeleton tree until 2001 (c)

Skeleton tree until 2011 (d)

Skeleton tree until 2013 (e)

Skeleton tree until 2015 (f)

Skeleton tree until 2017

Figure S32.

Long short-term memory: Skeleton tree evolution47itle yearDeveloping a Long Short-Term Memory (LSTM) based model for predicting water table depth in agriculturalareas 2018Stream-Flow Forecasting of Small Rivers Based on LSTM 2020Developing a Long Short-Term Memory-based signal processing method for Coriolis mass ﬂowmeter 2019Direct Multistep Wind Speed Forecasting Using LSTM Neural Network Combining EEMD and FuzzyEntropy 2019Dynamic neural network modelling of soil moisture content for predictive irrigation scheduling 2018SMArtCast: Predicting soil moisture interpolations into the future using Earth observation data in a deeplearning framework 2020Short-Term Streamﬂow Forecasting for Paraíba do Sul River Using Deep Learning 2019Synthetic well logs generation via Recurrent Neural Networks 2018Reservoir Facies Classiﬁcation using Convolutional Neural Networks 2019Comparative applications of data-driven models representing water table ﬂuctuations 2019title yearFiLM: Visual Reasoning with a General Conditioning Layer 2018LEARNING TO COLOR FROM LANGUAGE 2018Feature-wise transformations 2018RAVEN: A Dataset for Relational and Analogical Visual rEasoNing 2019A Dataset and Architecture for Visual Reasoning with a Working Memory 2018Cycle-Consistency for Robust Visual Question Answering 2019On Self Modulation for Generative Adversarial Networks 2019Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation 2019TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning 2019Predicting Taxi Demand Based on 3D Convolutional Neural Network and Multi-task Learning 2019

Table S12.

Long short-term memory: Clustering effect example. First line is the parent paper and the rest children.performance in problems unsolvable by traditional recurrent nets’ published in 2003 and ‘Modeling systems with internal stateusing evolino’ published in 2005. Both are coloured orange-red. Similarly, article ‘Generating Text with Recurrent NeuralNetworks’ published in 2011 is also slightly colder than its child, ‘Understanding the exploding gradient problem’, which waspublished in 2012. Their temperature difference is mainly owing to their research focus, as is reﬂected by their distinct ci-tation patterns. These counter examples also illustrate that the general rule "the more inﬂuential the hotter" is weak (Fig. S49(k)).We ﬁnd the knowledge temperature evolution of LFCP particularly interesting. Its knowledge temperature dropped from 6.53to 5.08 from 2001 to 2005. The decrease rate is greater than that of topic knowledge temperature. This is because its followershad little development, thus overall the bundle led by Learning to forget had a slower development than the entire topic. Itstemperature has been on the rise since 2007. In particular, the increase has greatly accelerated from 2015. We attribute itssurge to the arrival of several popular child papers published between 2014 and 2016: S2SNN (2014), ‘Empirical Evaluationof Gated Recurrent Neural Networks on Sequence Modeling’ (2014), NMTAT (2015) and DRLIR (2016) (Fig. S33). Theirinstantaneous popularity has brought learning to forget back to scientists’ attention. Recall that these papers also contributed alot to the knowledge temperature leap of the entire topic starting from 2017.We observe in addition certain clustering effect in the skeleton tree. For example, almost all child papers of ‘Developing aLong Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas’ deal with earth scienceand agriculture and ‘Visual Reasoning with a General Conditioning Layer’ leads a handful of articles specialising in visualreasoning (Table S12). We also identify some bundles dealing with energy forecast and ﬁnancial trading. All these observationsconﬁrm the effectiveness of our skeleton tree extraction algorithm. Moreover, these aforementioned bundles were born noearlier than 2018, thus they are also good illustrations of some latest research hotspots in the topic.

S3.3.2 Particle swarm optimization

The topic gained popularity and expanded its impact steadily from its birth until around 2004 largely under the joint effortsof the pioneering work and several well-developed child papers published before 2000, namely ‘A modiﬁed particle swarm48 igure S33.

Long short-term memory: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 1000in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁedby 3 times. 49ptimizer’,‘Empirical study of particle swarm optimization’, and ‘Parameter Selection in Particle Swarm Optimization’. It isalso these prominent child papers within the topic that lay the foundation of the skeleton tree (Fig. S35). Another 2 inﬂuentialyounger child papers, ‘Comparing inertia weights and constriction factors in particle swarm optimization’ published in 2000 and‘The particle swarm - explosion, stability, and convergence in a multidimensional complex space’ published in 2002, opened upa smaller sub-topic, which is visualized as the smaller major arm that extend from the central cluster. Their arrival ensuredtopic’s thriving in its ﬁrst 10 years, which is reﬂected by a rising T tgrowth and a relatively high T tstructure during that period. Incomparison, nothing remarkable happened in the following 5 years. Papers published during this period simply extended theestablished sub-topics. As a result, T t and its components stagnated (Fig. S34). Next, the machine learning wave revitalized thetopic. Starting from somewhere between 2010 and 2013, novel research focuses have been derived from the older sub-topicsand some of them already had certain development (Fig. S36 (e,f)). This phenomenon is illustrated by the increasingly rich endstructure of skeleton tree. In addition, annual publication number reached record high for the year 2014. This trend resulted in T t ’s surge shortly after. As the tendency is cooling down now, so is the topic. Overall, this is a topic waken up by the AI booming.There is a small cold cluster detached from the topic majority (Fig.S36 in the top-right of (f)). This cluster is led by popularchild paper ‘A new optimizer using particle swarm theory’ published in the same year as the pioneering work. Thus the twopapers probably have different focus even though they bear resemblance in their ideas. Their divergences cause their separationin the skeleton tree and their distinct knowledge temperatures. The separated skeleton tree also accords with topic’s galaxy maprepresentation where it seems to be split into 2 parties (Fig. S35).year | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S34.

Particle swarm optim: topic statistics and knowledge temperature evolutionNow we closely examine the internal heat distribution together with its latest skeleton tree (Fig. S35). After 25 years ofdevelopment, the heat has already fulled diffused to the entire topic, as most popular child papers that founded recent researchfocuses have a knowledge temperature above average. They are the topic’s heat sources. It is clear that node knowledgetemperature decreases globally as the articles are located farther away from multiple research centers. This ﬁts the general rule"the older the hotter" (Fig. 5(l)). Note that the colder average knowledge temperatures among the oldest articles is caused by50 igure S35.

Particle swarm optim: Galaxy map and current skeleton tree. Papers with more than 1700 in-topic citations arelabelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 5 times.51 a) Skeleton tree until 1998 (b)

Skeleton tree until 2001 (c)

Skeleton tree until 2004 (d)

Skeleton tree until 2007 (e)

Skeleton tree until 2010 (f)

Skeleton tree until 2016

Figure S36.

Particle swarm optim: Skeleton tree evolution52itle yearA self-generating fuzzy system with ant and particle swarm cooperative optimization 2009ANFIS modelling of a twin rotor system using particle swarm optimisation and RLS 2010Improving fuzzy knowledge integration with particle swarmoptimization 2010Designing Fuzzy-Rule-Based Systems Using Continuous Ant-Colony Optimization 2010Fuzzy Neural Networks Learning by Variable-Dimensional Quantum-behaved Particle Swarm OptimizationAlgorithm 2013Modeling and OnLine Control of Nonlinear Systems using Neuro- Fuzzy Learning tuned by MetaheuristicAlgorithms 2014

Table S13.

Particle swarm optim: Clustering effect example. First line is the parent paper and the rest children.the "cold" popular child paper mentioned in the previous paragraph and the relatively independent research branch it leads.This child paper is also responsible for the drastic average temperature plunge in most-cited papers (Fig. S49(l)). Besides, theblue nodes that surround the pioneering work and popular child papers in non-trivial clusters are papers with few or withoutany in-topic citations. However, the general rule is violated even if we do not consider this "cold" research branch. Forexample, ‘Path planning for mobile robot using the particle swarm optimization with mutation operator’ is slightly colder thanits child paper ‘Classic and Heuristic Approaches in Robot Motion Planning A Chronological Review’. The former is colouredyellow-orange and the latter orange. Their temperature difference is mainly due to their different research focus, which isreﬂected by their distinct citations. Similarly, paper ‘Using neighbourhoods with the guaranteed convergence PSO’ is alsocolder than its child paper ‘A guaranteed convergence dynamic double particle swarm optimizer’. The former is colouredorange and the latter orange-red. These counter examples illustrate that the general rule "the older the hotter" is not robust.We observe in addition certain clustering effect in the skeleton tree. For example, almost all child papers of ‘A self-generatingfuzzy system with ant and particle swarm cooperative optimization’ deal with fuzzy rule (Table S13). This conﬁrms theeffectiveness of our skeleton tree extraction algorithm.

S3.4 Rise-fall-cycle topics

S3.4.1 On random graphs, I

As is shown by T t and T tgrowth , the impact and popularity evolution of this topic is a bit complicated (Fig. S37). The publicationof popular child paper ‘On the evolution of random graphs’ (OERG) in 1984 brought the ﬁrst boom in the 1980s. This articlecombined its ancestors’ ideas and successfully fused the previously separated parts in skeleton tree due to an atypical citationfrom an older article ‘On the existence of a factor of degree one of a connected random graph’ (Fig. S38(b,c)). This merge isthe ﬁrst signiﬁcant evolution in knowledge structure and thus led to a spike in T tstructure . Afterwards, the topic went relativelysilent in the 1990s before a group of popular child papers came during 2001 and 2003. Among these articles, ‘Random graphswith arbitrary degree distributions and their applications’ published in 2001 non-trivially furthered the study of OERG andintroduced a new research focus into the topic, as is illustrated by the emergence of a third cluster in the skeleton tree (Fig.S38(f,g)). Its followers and popular child papers, ‘Evolution of networks’ published in 2002 and ‘The Structure and Function ofComplex Networks’ published in 2003 extended its idea and created several new research sub-ﬁelds. That is why we observesome splits derived from the young cluster (Fig. S38(g)). They successfully attracted a lot of attention in a short time andthe topic has witnessed an accelerated expansion since around 2000. Together with their contribution to the topic knowledgepattern, this topic experienced another booming around 2010. Later, the topic kept its activity thanks to several young promisingpapers including ‘Measurement and analysis of online social networks’ published in 2007, ‘Community detection in graphs’published in 2010 and ‘Catastrophic cascade of failures in interdependent networks’ published in 2010. Although they openedup several new research orientations, there have not been a substantial subsequent development and the branches leading bythem remain small in comparison to the principal clusters (Fig. S38(h,f)). Consequently, they have mostly helped maintain thetopic’s visibility and its stable impact.Now we closely examine the internal heat distribution together with its latest skeleton tree (Fig. S39). The topic has a longdevelopment history. In each period, new research focuses emerged (Fig. S38 every line shows a period). Today, we see 3 majorresearch focuses and their founders are all the heat sources. As the articles are located farther away from the pioneering paperor the sub-topic centers, their node knowledge temperature decreases globally. The blue nodes that surround the pioneeringwork and popular child papers in main clusters are papers with few or without any in-topic citations. Generally speaking, olderpapers are hotter than the younger (Fig. 5(m)). In comparison with other scientiﬁc topics, knowledge temperature ﬂuctuates53ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S37.

On random graphs: topic statistics and knowledge temperature evolution54 a) Skeleton tree until 1974 (b)

Skeleton tree until 1979 (c)

Skeleton tree until 1984 (d)

Skeleton tree until 1994 (e)

Skeleton tree until 1999 (f)

Skeleton tree until 2004 (g)

Skeleton tree until 2009 (h)

Skeleton tree until 2014 (i)

Skeleton tree until 2019

Figure S38.

On random graphs: Skeleton tree evolution55 igure S39.

On random graphs: Galaxy map and current skeleton tree. Papers with more than 200 in-topic citations arelabelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 5 times.56itle yearFalse Beliefs in Unreliable Knowledge Networks 2017Communication Policies in Knowledge Networks 2018Experts in Knowledge Networks: Central Positioning and Intelligent Selections 2018How to facilitate knowledge diffusion in complex networks: The roles of network structure, knowledge roledistribution and selection rule 2019

Table S14.

On random graphs: Clustering effect example. First line is the parent paper and the rest children.more among the "middle-aged" papers. This phenomenon is in line with the up and downs the topic experienced during theirpublication period. Besides, we also observe a general rule "the more inﬂuential the hotter" in the topic (Fig. S49(m)) as themost-cited child papers are among the hottest articles. However, this rule is only robust for the most eminent child papers.We observe in addition certain clustering effect in the skeleton tree (Table S14). For example, all child papers of ‘False Beliefsin Unreliable Knowledge Networks’ probe into knowledge network. This conﬁrms the effectiveness of our skeleton treeextraction algorithm. Moreover, the small group was born in 2017, suggesting that their research focus, knowledge network,may be one of the latest hotspots within the topic.

S3.4.2 Collective dynamics of ‘small-world’ networks

As is shown by T t , although the topic is heating up thanks to a robust knowledge accumulation, it has experienced multipleup and downs during the past 20 years due to short-term popularity ﬂuctuations (Fig. S40). This topic has welcome 2 wavesof popular child papers, the ﬁrst coming between its birth and 2003 and the second batch being published around 2009 and2010. The oldest popular articles, namely ‘Emergence of Scaling in Random Networks’ (ESRN) published in 1999 in Science,‘Exploring complex networks’ published in 2001 in Nature and ‘Community structure in social and biological networks’published in 2002 shaped the fundamentals of topic knowledge structure together with the pioneering work by 2007 (Fig.S41(c), S42). Their substantial contribution to the knowledge quantity and diversity led to a fast rise in both T tgrowth and T tstructure .As a result, the topic reached the ﬁrst peak around 2007. For the following years, the short-term exposure increase brought bythese eminent child papers gradually wore off and few child papers emerged as rising stars. The topic development duringthis period was primarily a fortiﬁcation of its existing knowledge architecture. That is why the topic slightly cooled downduring 2007 and 2010 despite a robust topic expansion and an on-going useful information accumulation. It was also duringthis down period when the younger popular child papers were published. Some of them, including ‘Complex brain networks:graph theoretical analysis of structural and functional systems’ published in 2009 and ‘Complex network measures of brainconnectivity: Uses and interpretations’ published in 2010, introduced new research sub-ﬁelds closely related to the idea of thepioneering work. They both formed a non-trivial branch extending directly out of the central cluster (Fig. S41(e,f)). Otherscontinued to enrich the existing research ﬁelds created by former eminent child papers. For example, ‘Emergence of Scaling inRandom Networks’ demonstrated an exceptional capability to attract substantially more subsequent works even after 10 yearsof its publication thanks to the explosive growth of social networks. The new knowledge extension and the lasting reﬁnementof the entire knowledge framework are portrayed by a ﬂourishing topic skeleton tree with multidimensional development anda steadily rising T t until 2016, a year when the topic hit the second peak. While the ﬁrst golden age is essentially owing toa rapid internal growth, the second streak is largely propelled by favorable social trends, especially the prevalence of onlinesocial network and the popularization of brain or neuroscience. Recently, the short-term focus beneﬁt has been dying out andno remarkable progress have been matured enough to cause a stir. Thus the topic is now seeing a small slip.Now we closely examine the internal heat distribution together with its latest skeleton tree (Fig. S42). All popular childpapers have a knowledge temperature above average. This shows that the heat diffusion within the topic is completed afterover 20 years of development. Most research focuses derived from the original ideas of the pioneering work have had somesubstantial development. The ensemble makes up the majority of heat sources within the topic. Besides, we also spot fewatypical heat sources. They are articles that connect non-trivial research directions in the skeleton tree. For example, paper‘Combatting maelstroms in networks of communicating agents’ published in 1999 connects the entire left research branchand the central cluster led by the pioneering work. It does not have any direct followers on skeleton tree, but it is the hottestnode and its big structure entropy suggests that it is important to the entire knowledge framework. Its value lies exclusivelyin the enlightenment. As the articles are located farther away from these heat sources, their node knowledge temperaturedecreases. This accords with the general rule "the older the hotter" (Fig. 5(n)). Note that the average temperature for theoldest papers is not the highest. This is due to the presence of 3 "cold" articles published in the same year as the pioneering57ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S40. small-world: topic statistics and knowledge temperature evolution58 a) Skeleton tree until 2001 (b)

Skeleton tree until 2004 (c)

Skeleton tree until 2007 (d)

Skeleton tree until 2010 (e)

Skeleton tree until 2013 (f)

Skeleton tree until 2016

Figure S41. small-world: Skeleton tree evolution59itle yearRobustness of Synchrony in Complex Networks and Generalized Kirchhoff Indices 2018Impact of network topology on the stability of DC microgrids 2019The key player problem in complex oscillator networks and electric power grids: Resistance centralitiesidentify local vulnerabilities 2019Quantifying transient spreading dynamics on networks 2019Global robustness versus local vulnerabilities in complex synchronous networks 2019title yearMultiplex lexical networks reveal patterns in early word acquisition in children 2017Multiplex model of mental lexicon reveals explosive learning in humans 2018How children develop their ability to combine words: a network-based approach 2019Multiplex model of mental lexicon reveals explosive learning in humans 2018Applying network theory to fables: complexity in Slovene belles-lettres for different age groups 2019Knowledge gaps in the early growth of semantic feature networks 2018The orthographic similarity structure of English words: Insights from network science 2018Node Ordering for Rescalable Network Summarization (or, the Apparent Magic of Word Frequency and Ageof Acquisition in the Lexicon) 2018spreadr: An R package to simulate spreading activation in a network 2019

Table S15. small-world: Clustering effect example. First line is the parent paper and the rest children.work. They either hardly inspired any subsequent works or failed to attract the attention of recent researches. Besides, the bluenodes that surround the pioneering work and the most popular child papers in principal clusters in the current skeleton tree arepapers with little or no in-topic development. However, the general rule is violated even if we let alone the oldest articles. Forexample, paper ESRN is slightly colder than its child papers, ‘The large-scale organization of metabolic networks.’ publishedin 2000 in Nature and ‘Classes of small-world networks’ published in 2000. Both are coloured red while ESRN is colouredorange-red. The temperature difference is mainly due to their different research focus, as is reﬂected by their distinct citations.The counter example also illustrates that the general rule "the more inﬂuential the hotter" is weak (Fig. S49(n)). Last but notthe least, we ﬁnd that most articles published in top journals such as Science and Nature have high knowledge temperaturesand numerous citations. This accords with the prior study which points out the boosting effect of renowned journals on articles .We observe in addition certain clustering effect in the skeleton tree (Table S15). This conﬁrms the effectiveness of our skeletontree extraction algorithm. Moreover, these newly-formed small groups are very young, suggesting that their research focus maybe among the latest hotspots within the topic. S3.4.3 Latent dirichlet allocation

As is shown by T t , the impact and popularity evolution of the topic ﬂuctuates. After reaching the ﬁrst peak around 2010, thisﬁeld cooled down for a while before it became trendy again around 2019 (Fig. S43). In the long run, the topic has an increasingimpact. The rise-and-fall pattern is largely due to the short-term popularity ﬂuctuations, as is demonstrated by the variationof T tstructure . In its ﬁrst 10 years, the topic developed 3 principal research sub-ﬁelds, as is illustrated by the skeleton tree (Fig.S44 (a,b,c)). The advancement is largely owing to the the arrival of several inﬂuential child papers within the topic around2005 and 2006: ‘A Bayesian hierarchical model for learning natural scene categories’, ‘Hierarchical Dirichlet Processes’ and’Dynamic topic models’ (Fig. S45). They increased the exposure of this topic, facilitated a rapid knowledge accumulation andenriched greatly the knowledge structure. Consequently, the topic had its ﬁrst golden period. Afterwards, the sweeping trend ofmachine learning helped the topic gain more attention and fame. A new wave of popular papers joining between 2009 and 2012gradually manifested their attractiveness, namely ‘Labeled LDA: A supervised topic model for credit attribution in multi-labeledcorpora’,‘Reading Tea Leaves: How Humans Interpret Topic Models’ and ‘Probabilistic topic models’. They extended theformer research focuses and provided inspiration for novel, promising ideas. This is captured by the increasingly complexmajor branches in skeleton tree (Fig. S44 (e,f)). In particular, this wave brought a large amount of attention immediately to thetopic and created a second glory.Now we closely examine the internal heat distribution together with its latest skeleton tree (Fig. S45). After over 20 years ofdevelopment, the original and recent research ideas have all had a rich development. The heat is therefore diffused to every60 igure S42. small-world: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 2000 in-topiccitations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 6 times.61ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S43.

LDA: topic statistics and knowledge temperature evolutioncorner of the skeleton tree with the help of popular child papers. Apart from multiple heat sources in the core of researchbranches, we also identify some hottest articles between principal clusters. For example, paper ‘Variational extensions to EMand multinomial PCA’ published in 2002 connects the entire right branch and the central cluster. It does not have many directfollowers within the topic, but it is the hottest node and it has a big structure entropy due to its knowledge bridging value. Asthe articles are located farther away from these "hit" papers, their node knowledge temperature decreases. This accords with thegeneral rule "the older the hotter" (Fig. 5(o)). The blue nodes that surround the pioneering work and popular child papers incentral parts are papers with few or without any in-topic followers. However, there are exceptions. Paper ‘You Are What YouTweet: Analyzing Twitter for Public Health’ (YWTPH) published in 1998 is colder than its child papers, ‘Using Twitter forbreast cancer prevention: an analysis of breast cancer awareness month’ published in 2013 and ‘Global Disease Monitoringand Forecasting with Wikipedia’ published in 2014. The latter two are coloured in orange-red while YWTPH is coloured inyellow-green. Their temperature difference lies primarily in their different research focus reﬂected by their distinct in-topiccitations. This counter example also suggests that another general rule "the more inﬂuential the hotter" is not robust (Fig. S49(o)).We observe in addition certain clustering effect in the skeleton tree (Table S16). This conﬁrms the effectiveness of our skeletontree extraction algorithm. Moreover, these mini-groups are very young, suggesting that their research focus may be among thelatest hotspots within the topic.

S3.4.4 A FUNDAMENTAL RELATION BETWEEN SUPERMASSIVE BLACK HOLES AND THEIR HOST GALAXIES

The knowledge temperature evolution of this topic is quite unique. Not only T t manifests multiple local peaks every 6 years,but more importantly it is T tstructure that dominates the ups and downs of T t (Fig. S46). As for T tgrowth , its increase in the earlydays is due to the continual arrival of popular child papers within the topic until 2006. They brought a steady inﬂow of newknowledge that enriched the topic content. Almost all the popular papers published after 2008 have not so far achieved acomparable development.The skeleton tree of this topic is also very special in that there are much fewer child papers surrounding the pioneering work,62 a) Skeleton tree until 2004 (b)

Skeleton tree until 2007 (c)

Skeleton tree until 2010 (d)

Skeleton tree until 2013 (e)

Skeleton tree until 2016 (f)

Skeleton tree until 2019

Figure S44.

LDA: Skeleton tree evolutiontitle yearThe spread of true and false news online 2018Assessing the Readiness of Academia in the Topic of False and Unveriﬁed Information 2019Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository 2020Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between ConspiracyTheory and Informational Videos 2018An opinion based cross-regional meteorological event detection model 2019Investigating Italian disinformation spreading on Twitter in the context of 2019 European elections 2020title yearAutomated Text Analysis for Consumer Research 2018Automated Text Analysis 2019Mining Product Relationships for Recommendation Based on Cloud Service Data 2018Text mining analysis roadmap (TMAR) for service research 2020Uniting the Tribes: Using Text for Marketing Insight: 2019

Table S16.

Clustering effect example. First line is the parent paper and the rest children.63 igure S45.

LDA: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 700 in-topic citations arelabelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 5 times.64he biggest red node situated in bottom-right, than its prominent descendants, ‘A Relationship between nuclear black holemass and galaxy velocity dispersion’(RNBHGVD) and ‘THE SLOPE OF THE BLACK HOLE MASS VERSUS VELOCITYDISPERSION CORRELATION’ (Fig. S48). In fact, the pioneering work has never been the gravity center since the verybeginning (Fig. S47(a)). Great structural changes took place between 2001 and 2003. Firstly, we observe a signiﬁcantdevelopment of 2 research directions. This is portrayed by the fast-growing left and right branches that derive from the clustersurrounded around the renowned child paper RNBHGVD. The root of these two primary branches, ‘On Black Hole Masses andRadio Loudness in Active Galactic Nuclei’ and ‘Black Hole Mass Estimates from Reverberation Mapping and from SpatiallyResolved Kinematics’, established their indispensable role in knowledge pass-on. Secondly, the smaller branch pointingup-right in the middle of these 2 branches was initially led by paper ‘COOLING FLOWS AND QUASARS. II. DETAILEDMODELS OF FEEDBACK-MODULATED ACCRETION FLOWS’ (CFQMFMAF) in 2001. However, after 2 years this paperlost all of its followers in skeleton tree to paper ‘The correlation between black hole mass and bulge velocity dispersion inhierarchical galaxy formation models’ published 1 year earlier (Fig.S47(b)). The latter only had 2 direct followers in 2001. Thereason behind the structural transformation is probably because the articles inspired from paper ‘A Theoretical Model for theMbh- σ Relation for Supermassive Black Holes in Galaxies’ (TMMRSBHG), the best-developped child paper of CFQMFMAF,during this period better characterise TMMRSBHG’s research interests with their citation patterns. The additional citationinformation led to a distinct judgment about the most primordial inspiration source and thus caused the shift in the skeletontree. Between 2003 and 2009, especially 2005 and 2009, the 3 principal research branches continued to grow. 2 out of the 3ramiﬁed at their ends, suggesting the formation of new research sub-topics. The third T tstructure spike appeared around 2015. 2out of the 3 principal branches manifested their lasting vigor by a non-trivial evolution at their ends especially during 2011and 2015. Furthermore, till this end, one principal branch developed so well that it not only overshadowed the other 2 mainbranches but also claimed the core of the skeleton tree. Its rapid growth is partly thanks to the arrival of 2 popular child papersin 2013: ‘REVISITING THE SCALING RELATIONS OF BLACK HOLE MASSES AND HOST GALAXY PROPERTIES’and ‘Coevolution (Or Not) of Supermassive Black Holes and Host Galaxies’ even though they themselves do not occupystrategic spots on the branch. Their direct contribution is rather implicit. But together with others they helped complete anobvious gravity shift in knowledge architecture, which is reﬂected by a surge in T tstructure .Now we closely examine the internal heat distribution and its latest skeleton tree (Fig. S48). The heat is already uniformlydiffused to major research sub-directions as most popular child papers have a knowledge temperature above average and someeven become heat sources. It is clear that the periphery of skeleton tree is colder than the central parts. The blue nodes thatsurround the pioneering work and popular child papers in central parts are papers with few or without any in-topic citations. Thisobservation accords with the general rule "the older the hotter" (Fig. 5(p)). The small drop in average knowledge temperaturesamong the oldest papers is due to the presence of several papers published in 2001 that had little inspiration to subsequentresearch. However, there are exceptions even if we ignore these old "cold" articles. For instance, paper ‘A uniﬁed model forAGN feedback in cosmological simulations of structure formation’ published in 2007 is slightly colder than its child paper ‘Theimpact of radio feedback from active galactic nuclei in cosmological simulations : formation of disc galaxies’ published in2008. The former is coloured yellow-orange whereas the latter is coloured orange. Their difference in heat-level is mainlydue to their slightly different research focus judging from their partially overlapped citations. Out of similar reason, paper‘AMUSE-Virgo. I. Supermassive Black Holes in Low-Mass Spheroids’ is also slightly colder than its child paper ‘CandidateActive Nuclei in Late-Type Spiral Galaxies’. These counter examples indicate that the other general rule "the more inﬂuentialthe hotter" is weak (Fig. S49(p)).We observe in addition certain clustering effect in the skeleton tree (Table S17). For example, all child papers of ‘Activegalactic nuclei in the mid-IR: evolution and contribution to the cosmic infrared background’ in current skeleton tree studyActive galactic nuclei (AGN). This conﬁrms the effectiveness of our skeleton tree extraction algorithm.65ear | V t | | E t | n t V t Use f ulIn f o t T tgrowth T tstruct T t Figure S46.

BLACK HOLES: topic statistics and knowledge temperature evolutiontitle yearActive galactic nuclei in the mid-IR: evolution and contribution to the cosmic infrared background 2006The VVDS type-1 AGN sample: the faint end of the luminosity function 2007The cosmological properties of AGN in the XMM-Newton Hard Bright Survey 2008VARIABILITY AND MULTIWAVELENGTH-DETECTED ACTIVE GALACTIC NUCLEI IN THEGOODS FIELDS 2011A multi-wavelength survey of AGN in massive clusters: AGN distribution and host galaxy properties 2014Using AGN Variability Surveys to explore the AGN-Galaxy Connection 2013

Table S17.

Clustering effect example. First line is the parent paper and the rest children.66 a) Skeleton tree until 2001 (b)

Skeleton tree until 2003 (c)

Skeleton tree until 2005 (d)

Skeleton tree until 2009 (e)

Skeleton tree until 2011 (f)

Skeleton tree until 2015

Figure S47.

BLACK HOLES: Skeleton tree evolution67 igure S48.

BLACK HOLES: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 340 in-topiccitations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes’ size is ampliﬁed by 5 times.68 a) (b) (c) (d)(e) (f) (g) (h)(i) (j) (k) (l)(m) (n) (o) (p)

Figure S49.

Relation between article in-topic citation and knowledge temperature. Grey dotted horizontal line marks thetopic knowledge temperature in 2020. Articles with no citation and the pioneering work are excluded.69

A topic group is an ensemble of several closely-related topics. During a certain period, topics in a group can manifest distinctpopularity and impact changes. Some may prosper while others stagnate or go downhill. When this is the case, our foresthelping mechanism allows thriving topics to donate a small fraction of their vigor to their dying siblings. The heat exchangeamong topic group members somehow takes "background popularity and impact" into consideration. After forest helping, theknowledge temperatures of closely related topics have a more similar evolution and correspond better to idea inheritance anddevelopment.

S3.5.1 wireless network group

The skeleton tree of topic led by ‘Critical Power for Asymptotic Connectivity in Wireless Networks’ (CPACWN) revealsan indisputably intimate relation between the itself and the topic led by ‘The capacity of wireless networks’ (CWN) (Fig.S12). Being the most prominent child paper of CPACWN, CWN substantially extended CPACWN’s ideas and founded a newresearch focus. Its crucial role in topic’s prosperity is also reﬂected by its high popularity and inﬂuence within the topic: itjointly inspired one third of the topic members, most of which were published during the ﬂourishing period. Their similarknowledge temperature evolution also conﬁrms their closeness. During forest helping, CPACWN’s topic donated some of itsheat to CWN’s topic in early days. This behavior models the promotion effect brought by CPACWN’s increasing impact andpopularity. However, this did not help CWN’s topic much because it had already a much bigger size. After the adjustment, theirknowledge temperature evolution is more similar than before. Both topics were hottest in 2007 and 2008 (Fig. S50). Thiscorresponds better with their individual development and inherent connection. In fact, CMN achieved such a huge successthat it took over its predecessor to be the new authority in their domain in just a few years. The dominating size of CWN’stopic clearly makes it a better representative of background popularity and impact, which usually has a big inﬂuence on similarsmaller topics. Therefore, the destiny of CPACWN’s topic is to some extent determined by the development of CMN’s topic.The rise-and-fall OF CWN’s topic is thus an indicator of CPACWN’s topic’s ﬂourishing.

Figure S50. wireless network group: knowledge temperature evolution before and after forest helping

S3.5.2 RNN gated unit group ‘Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling’ (GRU) introduced a new research focusand made non-trivial contribution to the recent thriving of topic led by ‘Long short-term memory’ (LSTM) (Fig. S33). In fact,nearly half of the papers that cite GRU also cite LSTM. Over the past 3 years, LSTM’s topic has had a substantial developmentand a fast-growing impact and popularity thanks to a large number of new publications. In comparison, GRU’s topic has shownsigns of stagnation shortly after its initial glory. Today, the phenomenal size of LSTM’s topic qualiﬁes LSTM’s authority claimin the domain. As a result, the prosperity of LSTM’s topic is a nice representative of background popularity and impact, whichusually has a big inﬂuence on similar smaller topics. While GRU helped with the ﬂourishing of LSTM’s topic in its early days,it is now LSTM’s topic’s turn to help maintain the heat-level of GRU’s topic (Fig. S51). A soaring background popularity andimpact is favorable for GRU’s topic future development, at least in a short term. For this topic group, the forest helping is justlike the mechanism that we observe in the real nature: mother tree shares nutrients with its child trees so as to give them a betterchance of survival. 70 igure S51.

RNN gated unit group: knowledge temperature evolution before and after forest helping

S3.5.3 word embedding group ‘Efﬁcient Estimation of Word Representations in Vector Space’ (EEWRVS) is the most inﬂuential child paper in both topicsrespectively led by ‘A neural probabilistic language model’ (NPLM) and ’A uniﬁed architecture for natural language processing:deep neural networks with multitask learning’ (UANLP). Furthermore, EEWRVS’s topic is more than twice the size of NPLM’sand UANLP’s. EEWRVS has outperformed its parents and has established authority in this research ﬁeld. The considerablesize of EEWRVS’s topic makes it a nice representation of background popularity and impact, which has an inﬂuence on smallertopics within the research ﬁeld. Owing to its close relationship with NPLM’s topic and UANLP’s topic, the booming ofEEWRVS’s topic more or less increases their visibility and attracts research attention. Through forest helping, the "energy"from EEWRVS’s topic slows down the perishing of NPLM’s topic and UANLP’s topic (Fig. S52). The heat exchange modelsthe boosting effect of the background, a bigger research ﬁeld where the 3 belong to.