[PDF] Taking census of physics

Abstract

Over the past decades, the diversity of areas explored by physicists has exploded, encompassing new topics from biophysics and chemical physics to network science. However, it is unclear how these new subfields emerged from the traditional subject areas and how physicists explore them. To map out the evolution of physics subfields, here, we take an intellectual census of physics by studying physicists' careers. We use a large-scale publication data set, identify the subfields of 135,877 physicists and quantify their heterogeneous birth, growth and migration patterns among research areas. We find that the majority of physicists began their careers in only three subfields, branching out to other areas at later career stages, with different rates and transition times. Furthermore, we analyse the productivity, impact and team sizes across different subfields, finding drastic changes attributable to the recent rise in large-scale collaborations. This detailed, longitudinal census of physics can inform resource allocation policies and provide students, editors and scientists with a broader view of the field's internal dynamics.

Full PDF

TTaking Census of Physics

Federico Battiston , Federico Musciotto , Dashun Wang , Albert-L ´aszl ´o Barab ´asi ,Michael Szell , and Roberta Sinatra Department of Network and Data Science, Central European University, Budapest, 1051, Hungary Kellogg School of Management, Northwestern University, Evanston, IL 60208, USA Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL 60208, USA Network Science Institute, Northeastern University, Boston, MA 02115, USA Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA 02115, USA Complexity Science Hub Vienna, Vienna, 1080, Austria MTA KRTK Agglomeration and Social Networks Lendulet Research Group, Centre for Economic and RegionalStudies, Hungarian Academy of Sciences, Budapest, 1094, Hungary Department of Mathematics, Central European University, Budapest, 1051, Hungary ISI Foundation, Torino, 10126, Italy * [email protected] There was a time when polymaths like Galileo knew all the physics that was there to beknown. Over the centuries, however, the body of knowledge spanned by physics exploded,encompassing topics as diverse as gravitational waves, graphene, or network science. As physicsexpanded in breadth and depth, physicists were forced to specialise, segmenting researchersinto their narrow, specialised communities. How many physicists work in each subﬁeld ofphysics today and how does each subdiscipline evolve? In which subﬁeld are physicists “born”into and where do they migrate, if at all? Here we take an intellectual census of physicists, theiractivities and career trajectories, helping us understand the evolution of the ﬁeld and gaining a r X i v : . [ phy s i c s . s o c - ph ] J a n uantitative insights about several fundamental scientiﬁc processes, from resource allocationto the exchange of knowledge. Advances in this direction were limited by the challenge inanswering two fundamental questions: 1) Who can be counted as a physicist? 2) How dowe survey their activities? The recent availability of large datasets of scientiﬁc publicationsﬁnally oﬀers opportunities to tackle these questions by exploring the production patterns of thescientiﬁc population.

2, 3

Indeed, the close to complete publication records of all physicists allowus to reconstruct their subﬁelds of study and career changes, oﬀering quantitative footprints notjust for the ﬁeld of physics, but its intimate relation with the broader scientiﬁc community.

4, 5

Combining large-scale data on physics publications and citations with recent data andnetwork science techniques, here we ask: What are the impact and productivity diﬀerencesbetween subﬁelds? As a physics student choosing my future specialty, how do I know whichsubﬁelds are growing? As a funding agency, how do I compare early-career physicists fromdiﬀerent subﬁelds? As a journal editor, how many papers should I expect from each subﬁeldand how do I compare their impact?

A census of physics subﬁelds

To oﬀer a data-driven answer to these questions,

2, 3 we identify the relevant physics papersand citations within Web of Science (WoS). We start by selecting ∼

5, 6 missing for example those published in interdisciplinary journals like

Nature or Science , or papers published in journals of other disciplines but that are of directrelevance for the physics community. To map out the complete physics literature we then set todetect physics papers by virtue of their patterns of citations among the other ∼

47 million papersin WoS. A paper is a potential physics publication if its references and citations to the core hysics literature are signiﬁcantly higher than in a null model in which each paper’s citationsare assigned randomly, regardless of a paper’s journal or research area. We identiﬁed ∼ ∼ between 1985 and 2015. We use this dataset to reconstruct thepublication proﬁle of 135,877 physicists with a persistent productivity between 1985 and 2015.See Box 1 and SI Section S3 for more details on the dataset curation and validation.The ﬁrst step in developing a census is to count the number of physicists working in eachsubﬁeld. Such counting is, however, not straightforward, as physicists may contribute to publi-cations in diﬀerent subﬁelds. We therefore associate each physicist with a primary subﬁeld ifthe number of her publications in the subﬁeld is higher, in a statistically signiﬁcant manner,than expected for a typical physicist (Box 1 and SI Section S4). The obtained subﬁeld demo-graphics oﬀer us a ﬁrst summary statistic (Fig. 1a): we ﬁnd that the largest subﬁeld is CondMat (condensed matter physics) with more than , physicists, capturing 46% of the entirephysicist population. It is followed by General ( , ), HEP (high energy physics, , ),Interdisc (Interdisciplinary physics, , ), Classical ( , ), Nuclear ( , ), AMO (Atomicand molecular physics, , ) and Astro physics ( , ). Plasma is the smallest subﬁeld ofphysics, with less than , researchers.Given the highly specialised nature of the physics subﬁelds, one might suspect that mostphysicists work in a single subﬁeld. Yet, we ﬁnd that highly specialised physicists are the xception rather than the rule: The majority of physicists (63%) are active in two or moresubﬁelds (Fig. 1b). This prompts us to ask: Which subﬁelds have particularly low or high ratesof specialisation? The diﬀerences between subﬁelds are striking, deﬁning two diﬀerent groups(Fig. 1c): six subﬁelds have less than specialised physicists. Among these subﬁelds, Interdisc has less than 1% of specialised physicists, in line with the expectation that interdisciplinaryphysicists bridge multiple subﬁelds. In contrast, the percentage of specialised physicists in

CondMat , HEP and

Nuclear is 42%, 34% and 25% respectively, at least an order of magnitudelarger than in the other group of subﬁelds. What drives the diﬀerent levels of specialisationbetween subﬁelds?A physicist working on two or more subﬁelds combines the collective know-how of theseﬁelds, a process deemed essential for novel discoveries in science.

To understand whichof the physics subﬁelds cross-pollinate most signiﬁcantly, we calculate the co-activities ofindividual physicists between each pair of subﬁelds. Co-activities are deﬁned by weightedlinks between subﬁelds, where the weights measure the observed versus expected co-activitiesbased on a randomised null model (SI Section S6). Starting with the highest weighted links, weplot the minimum number of links needed to have a connected network of subﬁelds (Fig. 1d).The network reveals a non-trivial co-activity structure, clustering all physics subﬁelds intothree broader areas, 1)

Interdisc and

CondMat , 2)

Classical , AMO , and

Plasma , 3)

HEP , As-tro , and

Nuclear , all held together by

General . This research space captures the intellectualaﬃnities between subﬁelds, facilitating movements between close subﬁelds, while limitingcross-pollination between distant ones like

Interdisc and

Nuclear . For example, the diversityof topics within

CondMat and

Classical and their adaptable approaches, like statistical me-chanics applied to multiple systems composed of large numbers of entities, makes it easier or those working in these subﬁelds to take their tools to diﬀerent disciplines. In contrast,more specialised subﬁelds like

HEP or Nuclear require their members to acquire familiaritywith large-scale, long-term projects. While scientists working in such ﬁelds may have deepknowledge and expertise on the subject they specialise in, they face a greater burden that limitstheir ability to explore other areas. The observed network is similar to the citation network between subﬁelds, showing that the ﬂow of knowledge is captured through multiple metrics,both by paper citations and by the activities of individual physicists. Birth, growth, and migration

Why are there so considerable diﬀerences in specialised physicists between similarly sizedsubﬁelds, like

Nuclear and

Interdisc (Fig. 1a,b)? To understand this heterogeneity, we ﬁrst assessthe relative growth rate of each subﬁeld over time, measuring the fraction of physicists enteringa subﬁeld every year (Fig. 2a). We ﬁnd that the growth rates of

Interdisc and

Astro increasedfrom a few percent in 1985 to over 20% and 27% respectively after 2010, substantially reshapingthe physics landscape in recent years. An opposite trend characterises

CondMat : while it hadthe largest share of new physicists in 1985, its share dramatically decreased over time, fallingbelow 5% after 2010.

HEP also displayed a receding trend just before 2010, but the spur of newresearch connected to the activity of the Large Hadron Collider in Geneva injected new forcesinto the ﬁeld. In particular,

HEP ’s sharp peak in 2010 can be attributed to the ﬁrst ATLAS andCMS publications (SI Section S7).Figure 2a mixes together physicists who start their careers in a particular subﬁeld withthose who make career transitions to other subﬁelds. There are remarkable examples of physi-cists who never changed their subﬁeld, like Klaus von Klitzing, whose ﬁrst publication wasin CondMat , and contributed over 500 papers to the subﬁeld, earning him the Nobel Prize in

985 for the discovery of the quantised Hall eﬀect. In contrast, Rainer Weiss, best known forinventing the laser interferometric technique at the heart of LIGO, which earned him the NobelPrize in 2017, published his ﬁrst paper on an unrelated topic in

AMO , “Magnetic Momentsand Hyperﬁne-Structure Anomalies of Cs , Cs and Cs ”. To distinguish such diﬀerentcareers, we next systematically explore career transitions within physics, asking: Where arephysicists “born”, and how do they “migrate” between subﬁelds? When do these transitionstypically occur?Figure 2b shows how many physicists began their careers in each subﬁeld (top rectangles).Remarkably, of the physicists began their careers by publishing in either CondMat , HEP , or

Nuclear ( of all physicists start out in CondMat ). These three subﬁelds capture “curricular”physics topics, the natural ending points of many undergraduate courses, hence the typicalstarting point of research careers.

General , covering topics of interests to a wide set of physicists,accounts for of ﬁrst publications. In contrast, only of physicists started publishing in Interdisc , and as low as began in Astro . As

Interdisc integrates other disciplines, it might bediﬃcult to start out as an

Interdisc physicist; the low percentage of

Astro starts may be rootedin the fact that traditionally it has not been a “curricular” subﬁeld. ox 1

Identifying subﬁelds

We classify papers into 9 subﬁelds, based on the 1-digit Physics and Astronomy Classiﬁcation Scheme (PACS) bythe American Physical Society (APS):•

General : Mathematical Methods, Quantum Mechanics, Relativity, Nonlinear Dynamics, Metrology•

HEP:

The Physics of Elementary Particles and Fields•

Nuclear:

Nuclear Structure and Reactions•

AMO:

Atomic and Molecular Physics•

Classical:

Electromagnetism, Optics, Acoustics, Heat Transfer, Classical Mechanics, and Fluid Dynamics•

Plasma:

Physics of Gases, Plasmas, and Electric Discharges•

CondMat:

Structural, Mechanical, and Thermal Properties; Electronic Structure, Electrical, Magnetic, andOptical Properties•

Interdisc:

Interdisciplinary Physics and Related Areas of Science and Technology•

Astro:

Astrophysics, Astronomy, and GeophysicsPACS were consistently used in papers published in APS journals between 1985 and 2015 (SI Section S2). Usingan algorithm that evaluates the patterns of citations and references between papers, we propagate subﬁeld labelsfrom APS papers to other papers: if the fraction of references and citations between a given paper and papers ina particular subﬁeld is larger than expected by the null model, the paper is assigned to that subﬁeld. A papermay be assigned to multiple subﬁelds, in line with APS papers reporting multiple PACS. In panel a) we show anexample of an unclassiﬁed paper which references in

CondMat , Plasma and

Astro , and which is cited by

CondMat , Astro and another publication still lacking a PACS. The publication is ﬁrst assigned to

CondMat and then to

Astro ,but not to

Plasma , as it lacks statistical signiﬁcant links to the subﬁeld. The algorithm is run iteratively untilconvergence for each subﬁeld, helping us associate at least one subﬁeld to 1,137,670 papers (SI Section S3).

Assigning physicists to subﬁelds

We analyse all careers with at least labeled papers between 1985 and 2015, capturing the careers of 135,877physicists. We consider a physicist working in a subﬁeld if her share of publications in the subﬁeld is higher thanthat of the average physicist. The statistical criterion we used, guarantees that each scientist is assigned to atleast one subﬁeld, and takes into account the diﬀerent sizes of subﬁelds. As an example, we show the resultof the criterion applied to the career of Stephen Hawking in panel b). In the physics dataset Hawking has 124papers associated to diﬀerent subﬁelds. Of these subﬁelds, only General (95 papers) and

Astro (77 papers) areassigned to the physicist through the statistical criterion, whereas

HEP (23 papers) and

Classical (1 paper) are notstatistically signiﬁcant, which is consistent with Hawking being known as a theoretical physicist and cosmologist.For validation and further methods see SI Sections S3, S4, S5. ) Unclassified paper 2) Propagation of CondMat (significant) 3) Propagation of Plasma (not significant) 4) Propagation of Astro (significant) a b

The links of Fig. 2b capture the signiﬁcant ﬂows between subﬁelds, linking the subﬁeldwhere a physicist published her ﬁrst paper, to the subﬁelds that best characterised her latercareers (SI Section S6). This diagram indicates that

CondMat is the starting point for manyphysicists who later specialised in

Interdisc , Classical , and

General . HEP and

Nuclear tend toswap researchers while feeding talents into

Astro , a pattern that may be rooted in the fact thatall three subﬁelds study radiation or nuclear and subnuclear processes. We ﬁnd that most

Interdisc physicists did not start their career there, but migrated from

CondMat and

General ,consistent with the hypothesis that one needs to acquire expertise in at least two ﬁelds beforebeing able to bring them together. Finally,

Plasma and

Astro welcome physicists with manydiﬀerent backgrounds, but rarely feed into other subﬁelds. The diversity of the incoming ﬂowsto

Plasma and

Astro suggests their accessibility to physicists with many diﬀerent backgrounds.We also measure the average time it takes to transition to a diﬀerent subﬁeld, capturedby the vertical axis of Fig. 2b. Once again,

HEP , Nuclear and

CondMat top the list: physicists ho did not start their career in these subﬁelds tend to transition towards them the earliest,typically by the third or fourth year of their research career. The opposite trend was observedfor

Interdisc and

Astro , which not only have the highest transition rates among subﬁelds, but arealso characterised by the longest time to transition. Indeed, on average a physicist publishesher ﬁrst paper on these two topics to years into her career, roughly double the transitiontime towards HEP , Nuclear and

CondMat . Interdisc displays a late switch, consistent with thehypothesis that it takes time to gather expertise in multiple ﬁelds. Similarly, physicists tend toswitch to

Astro typically after a relatively long experience in

HEP .The ﬂow diagram of Fig. 2b helps us better understand the research space captured by Fig. 1d.For instance, in the bottom right triple,

HEP plays the leading role in producing physicistswho transition to its tightly connected subﬁelds,

Nuclear and

Astro . In the top two nodes ofthe network,

CondMat is the main force feeding

Interdisc . The observed widespread careertransitions may reﬂect potential beneﬁts to the whole ﬁeld, cross-pollinating one physicscommunity with ideas and methods developed by a diﬀerent subﬁeld.

8, 9

The role of chaperones

The future prosperity of young scholars has often been linked to access to valuable mentor-ship at the early stages of a scientiﬁc career.

For example, a surprising fraction of Nobellaureates had a mentor-mentee or a co-authorship relation with another Nobel laureate,

18, 19 and scientists who co-author early with an established scientist are more likely to have higherimpact and higher chances to publish as lead author than other scientists. Taken together,a senior scientist who acts as “chaperone” during a scientist’s early career might foster theacquisition of skills, passing on experience and knowledge necessary for high achievementslater in a career. o quantify the chaperone eﬀect, we measure how many physicists co-author their ﬁrstpaper in a subﬁeld with a physicist who has published in that subﬁeld before. We ﬁnd thatthe chaperone eﬀect is particularly strong for

HEP , Nuclear and

CondMat , where over ofphysicists wrote their ﬁrst paper with someone who published before in the same subﬁeld(Fig. 2c and SI Section S8). This large share of chaperoned physicists could have several reasons,like the documented high number of physicists starting their career in these three subﬁelds,or the need to access large facilities, which require early-career physicists to collaborate withestablished scientists. Note that the typical large co-authorships patterns of

HEP can notexplain the magnitude of the chaperone eﬀect characterising this subﬁeld (SI Section S8).Other subﬁelds have a lower fraction of chaperoned physicists, especially

Interdisc and

Astro . These subﬁelds are often explored by more senior physicists who received mentorshipat a previous stage of their careers in a diﬀerent subﬁeld and often decide to explore the newarea without close supervision (26% of physicists are not chaperoned in

Interdisc and

Astro ,Fig. 2c). On top of this, applications of computational physics, like computational biophysicsor complex systems, classiﬁed as

Interdisc , require lower ﬁnancial resources compared toexperimental research and could also play a signiﬁcant role in explaining the low chaperoneeﬀect. Taken together, the chaperone eﬀect is strong in physics, with an average rate of 82%chaperoned physicists across subﬁelds. The eﬀect signals a research culture where physicistsoften get introduced to their future research area by senior colleagues in a collaborative setting,in contrast with disciplines like mathematics, where the majority of scientists start their careerwith publishing solo-author papers. roductivity, impact, and team size across subﬁelds Productivity and impact, capturing the number of papers published and citations receivedby a physicist, are frequently used metrics in the assessment of scientiﬁc careers.

22, 23

Thesequantities have implications for decisions and policies involving predicting, nurturing, andfunding early career scientists. Yet, the proper interpretation of these metrics must account forthe highly heterogeneous productivity and citation patterns characterising diﬀerent subﬁelds and for diﬀerent team sizes, both of which vary in time.Team size, i.e. the number of coauthors per paper, has been increasing steadily over the pastdecades in all ﬁelds, capturing an increasing collaboration in science. Are there particulardiﬀerences in collaborative patterns in the diﬀerent physics subﬁelds, and what are theirimplications on productivity and impact? To answer this question, we assess the diversity andevolution of collaboration, productivity, and citation standards in the diﬀerent subﬁelds ofphysics. First, the tendency of scientists to work in increasingly large teams has been particularlypronounced in

HEP (especially after 2005),

Nuclear (especially after 2010) and

Astro (especiallyafter 2000) (Fig. 3a). The observed explosive growth in these three subﬁeld is partly rooted inlarge-scale projects like ATLAS (SI Section S7). They also result in an increased productivity: asphysicists were involved in more and larger teams, the average number of papers they publishedeach year increased by a factor of 10 for

HEP and by a factor of 2 for

Nuclear and

Astro from1985 to 2015 (Fig. 3b). However, for the other six subﬁelds productivity has stayed constantover 30 years, and for all subﬁelds productivity has increased at a slower rate than team sizes.These diﬀerent rates of increase explain why fractional productivity, i.e. the ratio betweenthe number of papers and the average team size, decreased across all subﬁelds (Fig. 3c). Theeﬀect is the strongest in

HEP , Nuclear , and

Astro , where team size grew disproportionately. It is orth noting that in these subﬁelds authors are usually ordered alphabetically due to the largeaverage team size, making the assessment of credits for single authors more problematic. Taken together, we ﬁnd that the amount of knowledge produced per capita decreases in allsubﬁelds despite the increase in the total number of physicists and physics papers.Given the explosive increase in both team size and the number of papers per physicistsin

HEP , do

HEP physicists today have more or less impact than they had decades earlier? Toanswer this question we measured the average impact in number of citations after 5 years(Fig. 3d) and the fractional impact (ratio between number of citations and average team size,Fig. 3e) per physicist per subﬁeld. Interestingly, the average impact of

HEP shows a growthof comparable magnitude as the growth in average productivity, leading to an unchangedfractional impact. In other words, large-scale projects like ATLAS produce papers that generatea large number of citations, compensating for the massive numbers of co-authors (hundredsor more).Given some of the large productivity diﬀerences between diﬀerent subﬁelds, we also expectdiﬀerences in impact, measured in terms of cumulative citations over a career. For instance,how much impact does it take to be a scientiﬁc leader in

HEP and how is that diﬀerent in

CondMat ? In Fig. 3f and Fig. 3g we show the total number of papers and citations acquired overan average career by the top 5% of physicists in each subﬁeld (in terms of productivity). In bothterms,

HEP is by far the most rewarding subﬁeld, whose top scientists coauthor 169 papers andaccumulate over 7,000 citations. In contrast, top

Interdisc physicists coauthor only 18 paperswith less than 1,000 citations. The large discrepancy is not explained by paper citation rates,

32, 33 which are roughly constant across subﬁelds (SI Section S9), but by the high or low numberof papers per author in the respective subﬁeld (Fig. 3b). As a consequence, when physicists ith diﬀerent specialties compete for positions or grants, caution is needed in comparing theirproﬁles using metrics based on citations or productivity, as subﬁeld-dependent diﬀerencesappear from the very beginning of a career.What about the rate of top papers in the diﬀerent subﬁelds? We selected the top 1% ofall physics papers (in terms of citations) and assessed into which subﬁeld they fall (Fig. 3h).The majority falls into

CondMat , General and

HEP , however, this result is trivial as these ﬁeldsproduce the most papers. To unveil the signiﬁcant eﬀects we measured the surplus betweenthis top 1% distribution and the distribution of subﬁelds of all physics papers. As Fig. 3i shows,

Interdisc papers are 40% more likely to be in the top 1% than expected, while

Nuclear and

Plasma papers are 40% less likely to be found in the top 1%. The high rate of

Interdisc amongthe top cited papers might be partially explained by the ﬁnding that papers which are 15%novel and 85% conventional often have high impact. Interdisc is more likely to achieve thisbalance, since interdisciplinary research must be novel and, at the same time, must adhereto established principles. Another explanation is that

Interdisc is more likely to initiate newtopics or emerging subﬁelds. Papers that do open such new avenues are known to acquire ahigh number of citations as they become milestones, cited by subsequent papers once the ﬁeldis established.

34, 35

Recognition of physics subﬁelds

Do impact diﬀerences aﬀect the way in which the overall scientiﬁc community perceives thediﬀerent subﬁelds of physics? As a rough proxy of this recognition we take the Nobel Prizesawarded from 1985 to the present, highlighting each awarded subﬁeld (Fig. 3j, SI Section S10).Although the Nobel Prize often recognises research undertaken much before the selectionyear, the timing of Nobel prize selections could aﬀect the way in which the relative importance f diﬀerent physics communities are perceived by the committee. As a comparison betweenFig. 2a and Fig. 3j shows, Nobel Prizes are not related to the number of physicists ﬂocking intospeciﬁc physics communities, nor do they show signiﬁcant temporal clusters. However, thegeneral distribution of awarded subﬁelds reveals interesting tendencies: a large fraction ofNobel Prizes have been awarded to the “curricular” topics, like

CondMat , the subﬁeld withthe largest number of active researchers, and

HEP . Surprisingly,

Astro , despite the relativelymoderate size of its community, comes in third, with ﬁve Nobel Prizes. This success might belinked to the perception of astrophysics as a ﬁeld that studies the universe on a grand scale, aswell as to its strong ties to HEP, a regular recipient of Nobels. Other well established areas witha long history, such as

AMO and

Classical have also been recognised. In contrast, since 1985

Plasma and

Interdisc have not been awarded a Nobel Prize. The omission of

Interdisc likelycomes from the charter of the Nobel Prize to award clear-cut categories (e.g. physics, chem-istry, medicine/physiology) rooted in 19th century discriminating against interdisciplinarydiscoveries.

36, 37

Conclusions

As one of the oldest scientiﬁc disciplines, physics plays a fundamental role in the developmentof science. As the aperture of physics widens, the focus of individual physicists narrows, leadingprogressively to the formation of specialised communities and subﬁelds. Here we oﬀered anintellectual census of these subﬁelds, exploring how physicists migrate between them, howthey specialise and collaborate to create impactful research.We observed that subﬁelds rarely live in isolation but rather tend to overlap, with individualscientists working in multiple subﬁelds and transitioning between ﬁelds during their career.Mapping these overlaps reveals a highly non-trivial research space, displaying deep intellectual inks between some subﬁelds and large gaps between others.Physicists who are confronted with heated arguments on the allocation of resources todiﬀerent subﬁelds and departments, often use metrics of productivity or impact to seek pri-ority. However, our research suggests that such arguments should be taken with scepticism.Indeed, there are considerable ﬁeld-speciﬁc diﬀerences in the patterns of productivity andimpact. Publication rates have exploded in recent years in

HEP , Nuclear and

Astro , whereasfractional productivity is declining. In some subﬁelds, such as

HEP , researchers co-authors anexceptionally large number of papers, partly rooted in their unique culture of collaboration. Bycontrast, interdisciplinary physicists produce papers at a much lower rate but their papers tendto garner a disproportionally higher impact, once we factor in the relative size of the subﬁeld.Understanding these ﬁeld diﬀerences within physics represents the ﬁrst step towards a deeperunderstanding of our discipline. As tomorrow’s physicists working on diﬀerent topics competefor the same position and resources, these insights may prove pertinent for the sustainablevitality of physics as a discipline.Our study is based on Web of Science data, lacking the literature that has been exclusivelypublished in preprint servers like arXiv, leading to unavoidable (but small) diﬀerences insubﬁeld representation due to diverse publication cultures in diﬀerent communities. Forexample, the proportion of HEP and

Astro papers in arXiv is higher compared to our datasetand WoS, reﬂecting the common practice of these communities to communicate ﬁndings inpreprints rather than journal papers. However, there is a high overlap in the coverage of thephysics literature between diﬀerent databases and a high correlation of the representationof physics subﬁelds (SI Section S3), indicating that our ﬁndings should agree if repeated on adiﬀerent database. n this study we focused on careers of physicists within physics. However, these days, manyscientists with a background in the physical sciences contribute to ﬁelds outside of physics,from biology to ﬁnance, both in academia and the private sectors. For this reason, theinvestigation of the connection between physics and other scientiﬁc disciplines, and the careertransitions away from physics, remains as fruitful future work. Indeed, such an investigation,possibly aided with data sources that go beyond scientiﬁc publications, could shed light on therole of physics and its subﬁelds in the entire ecosystem of science and beyond.

Acknowledgments

This work was supported by the John Templeton Foundation Grant .5× 3× 2×2.5×observedexpected

CondMatInterdisc Astro NuclearHEPAMO

Plasma

Classical General

Figure 1.

Taking census of physics subﬁelds. a , Number of physicists per subﬁeld. b ,Percentage of physicists working in 1, 2, 3, or 4+ subﬁelds. We call the 37% of physicists whowork in only one subﬁeld specialised . c , Fraction of specialised physicists per subﬁeld. Mostsubﬁelds except for HEP , Nuclear and

CondMat have a negligible fraction of specialisedphysicists. d , The network of co-activity of individual physicists shows the nontrivialconnection between subﬁelds. Node size is proportional to number of physicists in thesubﬁeld, link width is proportional to the overlap between subﬁelds, quantiﬁed with the ratiobetween measured number of physicists working on the two subﬁelds and expected numberbased on a randomised null model. cb Figure 2.

Evolution of physics subﬁelds and careers. a , Relative growth rate, deﬁned asyearly fraction of physicists who published their ﬁrst paper in a new subﬁeld.

Interdisc and

Astro grow,

CondMat shrinks considerably.

HEP displays a spike in 2010 that can be attributedto large-scale collaborations like ATLAS and CMS (SI Section S7). Relative growth rate is lessreliable after 2010 due to early-career physicists accumulating publications at diﬀerent rates ineach subﬁeld, resulting in reaching the 5 publications threshold at diﬀerent times anddistorting the proportion of physicists in favor of more productive and non-specialisedsubﬁelds. b , Flow diagram of career transitions. The sizes of rectangles on the top areproportional to the number of career ﬁrst publications in each given subﬁeld. The rectanglesat the bottom are proportional to the number of physicists in each subﬁeld who did not starttheir career by publishing in the area – for example Astro and

AMO have roughly the samenumber of physicists although

Astro starts with 3%, while AMO with 5%. The distance from thetop reﬂects the average time at which a career transition towards a subﬁeld occurs. Flows areproportional to the number of physicists who ﬁrst published in a subﬁeld diﬀerent from theone in which they worked previously. Only signiﬁcant ﬂows, i.e. those that are larger thanexpected in the null model, are shown. The percentages on the bottom rectangles report thecontribution of the subﬁeld that is contributing most. c , Fraction of not chaperoned physicistsin each subﬁeld. A large majority of physicists starting in HEP , Nuclear , or

CondMat co-authortheir ﬁrst paper with physicists who have already published in the subﬁeld. Other subﬁeldshave a much higher fraction of physicists who are not chaperoned in. b cd efg hij igure 3.

Productivity and impact across physics communities. a , Average team size,deﬁned as average number of authors per paper, over time. Team sizes grow in all ﬁelds,especially in

HEP , Nuclear , and

Astro due to large-scale experimental projects. b , Averageproductivity, deﬁned as number of papers per author, over time. Productivity grows for HEP , Nuclear , and

Astro but stays roughly constant for other subﬁelds. c , Fractional productivity,i.e. number of papers divided by team size, over time. For all subﬁelds productivity grows lessthan team size, therefore fractional productivity decreases. d , Average impact, deﬁned asnumber of citations per author within a 5 years window. Impact increases in all ﬁelds, but only HEP shows an exceptional growth. e , Fractional impact, i.e. number of paper citations dividedby team size, over time. Most subﬁelds show a roughly constant trend until 2005. f , Number ofpapers of the top physicists for productivity. Due to diﬀerent collaboration standards, HEP physicists coauthor more papers than other subﬁelds.

Interdisc physicists produce anespecially low number of papers. g , Number of citations of the top physicists forproductivity. HEP physicists receive more citations because of their high productivity. h ,Fraction of top 1% cited papers per subﬁeld and i , subﬁeld surplus with respect to the numberexpected given the subﬁeld size. Interdisc generates the highest number of high impact paperscompared to its size. j , Nobel Prizes in physics per year across subﬁelds. Plasma and

Interdisc have not received an award. eferences Jones, B. F. The burden of knowledge and the “death of the renaissance man”: Is innovationgetting harder?

The Review of Economic Studies , 283–317 (2009). Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science.

Science , 477–480 (2017). Fortunato, S. et al.

Science of science.

Science , eaao0185 (2018). Deville, P. et al.

Career on the move: Geography, stratiﬁcation, and scientiﬁc impact.

Scientiﬁc Reports (2014). Sinatra, R., Deville, P., Szell, M., Wang, D. & Barabási, A.-L. A century of physics.

NaturePhysics , 791 (2015). Deville, P.

Notices of the AMS , 212–223 (2009). Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientiﬁc impact.

Science , 468–472 (2013).

Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ researchstrategies.

American Sociological Review , 875–908 (2015). URL https://doi.org/10.1177/0003122415601618 . https://doi.org/10.1177/0003122415601618 . Guevara, M. R., Hartmann, D., Aristarán, M., Mendoza, M. & Hidalgo, C. A. The researchspace: using career paths to predict the evolution of the research output of individuals,institutions, and nations.

Scientometrics , 1695–1709 (2016).

ATLAS experiment reports. https://atlas.cern/updates/atlas-news/atlas-experiment-reports-its-first-physics-results-lhc . Jia, T., Wang, D. & Szymanski, B. K. Quantifying patterns of research-interest evolution.

Nature Human Behaviour , 0078 (2017). Balassa, B. Trade liberalization and ‘revealed’ comparative advantage.

Manchester School

Crosta, P. M. & Packman, I. G. Faculty productivity in supervising doctoral students?dissertations at cornell university.

Economics of Education Review , 55–65 (2005). Malmgren, R. D., Ottino, J. M. & Amaral, L. A. N. The role of mentorship in protégé perfor-mance.

Nature , 622 (2010).

Chariker, J. H., Zhang, Y., Pani, J. R. & Rouchka, E. C. Identiﬁcation of successful mentoringcommunities using network-based analysis of mentor–mentee relationships across nobellaureates.

Scientometrics , 1733–1749 (2017).

Zuckerman, H. Nobel laureates in science: Patterns of productivity, collaboration, andauthorship.

American Sociological Review

Ma, Y. & Uzzi, B. The scientiﬁc prize network predicts who pushes the boundaries ofscience. https://arxiv.org/abs/1808.09412 (2018).

Sekara, V. et al.

The chaperone eﬀect in science.

PNAS, in print (2018). Szell, M. & Sinatra, R. Research funding goes to rich clubs.

Proceedings of the NationalAcademy of Sciences , 14749–14750 (2015).

Sinatra, R., Wang, D., Deville, P., Song, C. & Barabási, A.-L. Quantifying the evolution ofindividual scientiﬁc impact.

Science , aaf5239 (2016).

Liu, L. et al.

Hot streaks in artistic, cultural, and scientiﬁc careers.

Nature

Radicchi, F., Fortunato, S. & Castellano, C. Universality of citation distributions: Toward anobjective measure of scientiﬁc impact.

Proceedings of the National Academy of Sciences , 17268–17272 (2008).

Pavlidis, I., Petersen, A. M. & Semendeferi, I. Together we stand.

Nature Physics , 700(2014). Wuchty, S., Jones, B. & Uzzi, B. The increasing dominance of teams in production ofknowledge.

Science , 1036–1039 (2007).

Shen, H.-W. & Barabási, A.-L. Collective credit allocation in science.

Proceedings of theNational Academy of Sciences , 12325–12330 (2014).

Lehmann, S., Jackson, A. & Lautrup, B. Measures for measures.

Nature , 1003–1004(2006).

Lehmann, S., Jackson, A. & Lautrup, B. A quantitative analysis of indicators of scientiﬁcperformance.

Scientometrics , 369–390 (2008). Hicks, D., Wouters, P., Waltman, L., Rijcke, S. d. & Rafols, I. Bibliometrics: the LeidenManifesto for research metrics.

Nature (2015).

Waltman, L. A review of the literature on citation impact indicators.

Journal of Informetrics , 365–391 (2016). Lillquist, E. & Green, S. The discipline dependence of citation statistics.

Scientometrics ,749–762 (2010). Radicchi, F. & Castellano, C. Rescaling citations of publications in physics.

Physical ReviewE , 046116 (2011). Newman, M. The ﬁrst-mover advantage in scientiﬁc publication.

EPL (Europhysics Letters) , 68001 (2009). Van Noorden, R. Interdisciplinary research by the numbers.

Nature News , 306 (2015).

Szell, M., Ma, Y. & Sinatra, R. Interdisciplinarity: A nobel opportunity. accepted for publica-tion in Nature Physics (2018).

Bromham, L., Dinnage, R. & Hua, X. Interdisciplinary research has consistently lowerfunding success.

Nature , 684 EP – (2016). URL http://dx.doi.org/10.1038/nature18315 . The arXiv repository. https://arxiv.org . Martín-Martín, A., Orduna-Malea, E. & Delgado López-Cózar, E. Coverage of highly-citeddocuments in google scholar, web of science, and scopus: a multidisciplinary comparison.

Scientometrics , 2175–2188 (2018).

Farmer, J. D. Physicists attempt to scale the ivory towers of ﬁnance.

Computing in Science& Engineering , 26–39 (1999). upplementary Information Taking Census of Physics

We identify physics publications in journals which are not explicitly labelled as physics journalsby means of a method ﬁrst used in Refs.

1, 2

Such method allows to reconstruct a communityin a network when only a small fraction of nodes are explicitly labelled as belonging to thecommunity. In our case, the hypothesis is that physics papers can be found not only in con-ventional physics journals (core physics papers) but also in other venues (interdisciplinaryphysics papers). It is possible to identify such interdisciplinary papers if they have a signiﬁcantnumber of references or citations in conventional physics venues. In Ref. the label propa-gation algorithm was ﬁrst applied to an old version of the Web of Science (WoS), encodinginformation about scientiﬁc publications until 2012 and based on an old database structure.Here we reapply the method on an updated version of WoS purchased from Clarivate Analytics,encoding information about publications until 2017, and using a new database structure, witha diﬀerent identiﬁcation system for papers among other things. We obtain a new physics a r X i v : . [ phy s i c s . s o c - ph ] J a n ataset of papers, which we want to further characterise by identifying the physics subﬁeldsthey belong to. For this reason, papers in the dataset except those of the American PhysicalSociety (APS) journals, are then considered to be assigned a given subﬁeld and be part of ourphysics communities analysis. The label propagation method at the subﬁeld level is a modiﬁedimplementation of the algorithm presented in this Section, and it is illustrated in detail inSection S3.The label propagation method to construct the physics dataset works in the following way.Let us consider a directed network with N nodes, for instance the citation network described bythe WoS dataset, where nodes are scientiﬁc publications, and a direct link between publication i and publication j exists if paper i cites paper j . Each node i has an in-degree k IN (numberof citations) and an out-degree k OUT (number of references). Nodes with k IN = and k OUT = are publications without references and citations and are isolated nodes in the network.Additionally, in our case each node i is characterised by a variable t i corresponding to the timeof publication of the article. The method is based on an iterative process where at each step s the N nodes are assigned to three sets: the core set C s , the tangent set T s and the external set E s . The core set C s includes the nodes that are considered to be part of the target communityat a given time step s by the algorithm. In our case, at the step s = , C includes all articlespublished in physics journals. The purpose of this initial core set is to act as a seed to detectother nodes that are part of the community, even if initially they are not classiﬁed as such, andthat will be iteratively included in C s at subsequent steps s > 0. The second set is the tangent set T s , and contains all the nodes outside the core set C s that have at least one (ingoing or outgoing)connection to a node within C s . The third set is the external set E s , and corresponds to allnodes outside the core set C s that share no connection with nodes within C s , and thereforehave no chance to be included into the core at the subsequent step s + . By deﬁnition we have C s ∪ T s ∪ E s = N and C s ∩ T s ∩ E s = /0 .The basic idea of the method is to iteratively extend the target community C s into C s + byadding candidate nodes from T s that are statistically expected to be part of the community basedon their connections. In our case this corresponds to identifying as physics all scientiﬁc paperswhich are not published in physics journals, but whose patterns of references and citations areindistinguishable from those published in the traditional physics venues. The purpose of thetangent set T s is to contain all candidate nodes, i.e. nodes that might subsequently be added to he target community C s at step s after inspection of their incoming and outgoing links. To doso, at each step s and for each node i we compute two variables: r INi , s and r OUTi , s . These variablesquantify the expectation of a particular node to be part of the target community C s based on itsincoming citations and outgoing references.Let us focus ﬁrst on incoming citations, evaluated through r INi , s , where r INi , s = k IN , J i , s ˆ k IN , J i , s . (1)Here k IN , J i , s corresponds to the number of incoming links (citations) to node i originating fromnodes in the core C s . ˆ k IN , J i , s , instead, accounts for the expected number of incoming linksfrom the core in a null model where the real number of incoming and outgoing links of eachnode (citations and references of each paper) in the network is ﬁxed. This last constraintcorresponds to consider the directed conﬁguration model ensemble of the original citationnetwork, meaning that we can write ˆ k IN , J i , s = k INi ∑ j ∈ C s k OUTj ∑ j ∈ N k OUTj (2)where k INi denotes the total number of incoming links to node i , and the remaining termcorresponds to the probability for a link to originate from C s . As an article i can receive acitation from another paper j only if the latter is more recent, i.e. t j > t i , we eventually set ˆ k IN , J i , s = k INi ∑ j ∈ C s | t j > t i k OUTj ∑ j ∈ N | t j > t i k OUTj . (3)Similarly, the share of outgoing references are evaluated through r OUTi , s , where r OUTi , s = k OUT , J i , s ˆ k OUT , J i , s , (4)and ˆ k OUT , J i , s = k OUTi ∑ j ∈ C s | t j < t i k INj ∑ j ∈ N | t j < t i k INj . (5)A value r INi , s > ( r OUTi , s >1) corresponds to a node that is more likely to reference (be cited from)nodes from the core than what would be expected at random. At each step s of the process, we se the variables r INi , s and r OUTi , s associated to nodes in T s to produce the updated core set C s + .First we add all nodes in C s to C s + . Then, for each node i ∈ T s , we add i to C s + if we have r INi , s > τ IN (6)or r OUTi , s > τ OUT . (7)The thresholds τ IN and τ OUT are ﬁxed based on a parameter p such that the thresholds τ IN and τ OUT correspond respectively to the p − th percentile of the distribution of r INi , and r OUTi , valuesfor nodes within the initial core set C . Once nodes i ∈ T s satisfying the conditions of Eq.6 orEq.7 are added to the core set C s + , both sets T s and E s can be updated to T s + and E s + from C s + . The process stops when C s has converged, i.e. when no nodes from T s can be added tothe core set C s . Note that while the thresholds τ IN and τ OUT remain constant during the wholeprocess, the values r INi , s and r OUTi , s associated to each node i will change at each iteration, giventhe fact that new nodes will incorporate the set C s at each iteration step s. As shown in Ref., in the case of physics publication in the WoS dataset the algorithm was run iteratively for steps, showing fast convergence.The parameter p can be considered as a tolerance parameter in the sense that it deﬁnesthe minimal attraction needed for a node to be incorporated in the growing core. As describedin Refs.,

1, 2 in our case it is possible to set the value of p by validating the algorithm on allpublications of two interdisciplinary journals for which a subset is labelled explicitly as physics,namely Science (1995-2013) and

PNAS (1915-2013). The best trade-oﬀ between true positive( . ) and true negative rates ( . ) was found for p = . By running the algorithm onthe new version of the WoS dataset comprised of ∼

54 million papers, with an initial core of ∼ ∼ Nature ,and several materials and chemistry journals. ank Journal (number of papers)

Journal (percentage of papers)

Table S1.

Non-physics journals with most physics publications and highest percentage ofphysics publications identiﬁed by means of label propagation.

S2 Identifying physics subﬁelds from PACS codes

Despite the WoS dataset provides a thorough classiﬁcation of core physics publications intodiﬀerent subﬁelds (see Section S3), such classiﬁcation is not detailed enough to our scopeand, most importantly, it fails to associate a subﬁeld to publications not in physics journals.For such a reason, in our work we associated publications to diﬀerent subﬁelds according tothe Physics and Astronomy Classiﬁcation Scheme (PACS) by the American Physical Society, a hierarchical classiﬁcation used for papers in APS journals between 1977 and 2015. Theclassiﬁcation uses four digits and an extra identiﬁer. The 1-digit identiﬁes 10 diﬀerent physicssubﬁelds, namely: General (0), The Physics of Elementary Particles and Fields (shortened as

HEP , 1),

Nuclear

Physics (2), Atomic and Molecular Physics (

AMO , 3), Electromagnetism, Optics,Acoustics, Heat Transfer, Classical Mechanics, and Fluid Dynamics (

Classical , 4), Physics ofGases, Plasmas, and Electric Discharges (

Plasma , 5), Condensed Matter: Structural, Mechanicaland Thermal Properties (6), Condensed Matter: Electronic Structure, Electrical, Magnetic, andOptical Properties (7), Interdisciplinary Physics and Related Areas of Science and Technology(

Interdisc , 8), Geophysics, Astronomy, and Astrophysics (

Astro , 9). We merged PACS 6 and 7 into unique category named

CondMat , in order to match other common physics classiﬁcations,such as that found for the arXiv (see Section S3). We stress that the term interdisciplinaryphysics, assigned in Ref. to describe physics publications in non-physics journals, is notlinked to the PACS 8 of the APS scheme. In the following, as well as in the main text, the termInterdisciplinary physics is reserved to identify publications and authors working in this precisesubﬁeld of physics, diﬀerently from Ref. PACS can be found in the APS dataset, available fromthe APS upon request, encoding information about all publications appeared in the journalsof the American Physical Society until 2015. Although PACS appeared in 1977, only a smallfraction of the papers were assigned one until they were enforced in 1985. For this reason, wefocused our analysis on the years 1985-2015, for which our dataset has 435,722 papers with atleast one PACS. 5,616 more papers have assigned a PACS but were published before 1985. Morein detail, between 1985-2015 we have 265,549 papers with exactly one 1-digit PACS, 138,176with two PACS, 29,806 with three PACS, 2,160 with PACS and 31 with ﬁve PACS.In Fig. S1 we report the distribution of the 9 physics subﬁelds for six well-established journalspublished by the APS, namely the general purpose Physical Review Letters and the specialisedvenues

Physical Review A - E . Physical Review B (covering condensed matter and materialsphysics) and

Physical Review C (covering nuclear physics) indeed predominantly publishpapers belonging to a single subﬁeld, respectively Condensed Matter and Nuclear Physics.Conversely

Physical Review A (covering atomic, molecular, and optical physics and quantuminformation),

Physical Review D (covering particles, ﬁelds, gravitation, and cosmology) and

Physical Review E (covering statistical, nonlinear, biological, and soft matter physics) publishacross a greater mixture of subﬁelds. As expected,

Physical Review Letters , the APS ﬂagshipjournal, publishes across all diﬀerent domains, even though with diﬀerent frequency.Similarly to the identiﬁcation of physics papers in non-physics venues, we use the paperspublished in the APS journals as the initial seed to assign subﬁelds to other physics publicationsby means of label propagation (see Section S3 for details.). In such a way, we obtain a data-driven subﬁeld classiﬁcation of physics papers in the WoS dataset.In Fig.S2a we report the proportions of APS papers belonging to a given subﬁeld, andcompare it to that of our newly created dataset. In Fig.S2b we report the distribution of thenumber of subﬁeld per paper in the APS between 1985 and 2015, as well as the fraction ofnumber of papers per subﬁeld over the years (Fig.S2c). %20%40%60%80% P e r c e n t a g e o f p a p e r s Phys. Rev. A Phys. Rev. B Phys. Rev. C G e n e r a l H E P N u c l e a r A s t r o A M O C l a ss i c a l P l a s m a C o n d M a t I n t e r d i s c P e r c e n t a g e o f p a p e r s Phys. Rev. D G e n e r a l H E P N u c l e a r A s t r o A M O C l a ss i c a l P l a s m a C o n d M a t I n t e r d i s c Phys. Rev. E G e n e r a l H E P N u c l e a r A s t r o A M O C l a ss i c a l P l a s m a C o n d M a t I n t e r d i s c Phys. Rev. Lett.

Figure S1.

Subﬁeld distribution for papers published in APS journals.

Diﬀerent APSjournals show diﬀerent publication patterns across subﬁelds.

Physical Review B coverspredominantly

CondMat , and

Physical Review C is similarly focused on

Nuclear . In contrast,

Physical Review A , Physical Review D and

Physical Review E do not cover a single, predominantsubﬁeld.

Physical Review Letters is the most balanced journals of the APS publishing across allsubﬁelds.

S3 Assigning Physics subﬁelds to Web of Science publications

We propagate physics subﬁelds to physics publications in the WoS dataset based on relevantpatterns of references and citations to the speciﬁc subﬁeld(s), adapting the method described inthe ﬁrst section of this SI. For each subﬁeld we have a diﬀerent initial core set C α , correspondingto all publications in the APS publications between 1985 and 2015 associated to a given subﬁeld α . First, we matched the papers of the APS dataset into the Web of Science dataset, either viaexact doi matching, or, for when the doi is not available, by using the Levenshtein distance tocompute title similarity. In this second case the match was accepted if there was at least 90%string similarity between the titles of two papers in the datasets, and the second best matchhad a string similarity at least 5 times worse. In this way we were able to match 90% of all thepapers manually assigned to a subﬁeld between 1985 and 2015.At diﬀerence with the original implementation, where it was possible to set the thresholds τ IN and τ OUT by evaluating the performance of the algorithm on the ’groundtruth’ of physicspapers published in interdisciplinary journals such as

Science and

PNAS , such type of validationis not possible at the subﬁeld level. For such a reason, for label propagation at the subﬁeld evel we slightly modiﬁed the original implementation. We observe that the algorithm maypropagate subﬁelds both to papers within and out of the original APS core, which is made ofpapers that already have a PACS code. For such a reason, for each subﬁeld α we selected thethreshold τ α so that after iterations the number of papers of each subﬁeld cannot grow morethan 10% within the original APS dataset. For simplicity, we chose τ IN , α = τ OUT , α . Afterwards,we performed label propagation for each subﬁeld α independently. We obtained a total of1,137,670 papers in WoS published between 1985 and 2015 and classiﬁed within one of thesubﬁelds of Physics. We note that also some papers outside the considered time-span wereassigned a subﬁeld, but we focused our analysis on the period − to be consistentwith the years when PACS were systematically used in publications by the APS. As alreadymentioned, PACS corresponding to the two categories associated to Condensed Matter weremerged into the same subﬁeld.It is interesting to compare the classiﬁcation of papers obtained through label propagationwith that of the original APS dataset. Figure S2a compares the fraction of subﬁelds in theoriginal and the propagated datasets. The two datasets have a similar subﬁeld distributionwith a cosine similarity of 0.99. Diﬀerences in the two datasets are likely to indicate an under-or over- representation of some areas of physics in the Physical Review series compared tothe overall physics world. In Fig. S2b we report the distribution of the number of subﬁeldsper paper in the two datasets. Papers in the reconstructed physics dataset tend to be slightlymore specialised ( of the papers are assigned to a single subﬁeld) than those in the APSdataset ( ). However, overall the two distributions are quite similar. Finally, in Figs. S2c,d weshow the evolution of the fraction of papers of diﬀerent subﬁelds in the APS dataset and in ourreconstructed dataset from 1985 to 2015. It is evident how the two datasets have very similartemporal patterns during the period under investigation.

Validation:

To test the robustness of our ﬁndings, we validated our data-driven classiﬁcationof papers across subﬁelds. As already mentioned, PACS codes were systematically introduced inpublications in the APS journals 1985. As our method classiﬁes papers into subﬁelds accordingto patterns of references and citations only, our algorithm naturally assigns subﬁelds also topublications in the APS journals before 1985, provided that they are signiﬁcantly connectedto the corresponding core papers for the subﬁeld(s). Five of the previously six analysed APSjournals (with the exception of

Physical Review E ) were born before 1985. In Fig. S3 we test the .15 0.30 0.45

APS R e c on s t r u c t ed ph ys i cs da t a s e t F r a c t i on o f APS pape r s Num ber of subfields F r a c t i on o f pape r s APSWoS

Year F r a c t i on o f pape r s i n ou r da t a s e t ab cd Figure S2.

Comparison between the APS dataset and the reconstructed physics dataset. a

Scatterplot of the fraction of subﬁelds appearing in papers of the APS dataset and in thereconstructed physics dataset. b Distribution of number of subﬁelds per paper in the twodatasets. c , d Temporal evolution of the fraction of subﬁelds between 1985 and 2015 for thetwo datasets.robustness of the subﬁeld distributions in the journals as a way to assess the eﬀectiveness of ourdata-driven method to classify physics papers across subﬁelds by comparing the distributionof the subﬁeld manually assigned between 1985 and 2015 in

Physical Review. A, B, C, D , and

Physical Review Letters , with that obtained by means of label propagation for papers publishedbefore 1985 in the same journals. The two distributions are highly correlated for all journals,with cosine similarities ranging from 0.88 to 0.99 .We also tested the robustness of our subﬁeld categorisation by comparing it to additionalsources providing alternative physics classiﬁcations, namely the physics classiﬁcation providedby (i) the WoS dataset (for core physics papers only), (ii) the arXiv repository, that collectselectronic preprints of papers related to physics topics. The cosine similarity between thefraction of papers in our dataset and in the two alternative datasets is quite high, respectively .00.20.40.60.8 A P S r e a l s u b f i e l d s ( - ) Phys. Rev. A Phys. Rev. B Phys. Rev. C

APS propagated subfields (<1985) A P S r e a l s u b f i e l d s ( - ) Phys. Rev. D

APS propagated subfields (<1985)Phys. Rev. Lett.

GeneralHEPNuclearAstroAMOClassicalPlasmaCondMatInterdisc

Figure S3.

Testing propagated subﬁelds in APS journals before 1985.

Scatterplot betweenthe subﬁeld distribution of the papers published in the APS journals after 1985, and thepropagated subﬁeld distribution for papers published before 1985 in the same journals. Thecosine similarities between the distribution of papers before and after 1985 are (i)

Physical Review A , (ii) Physical Review B , (iii) Physical Review C , (iv) Physical Review D and (v)

Physical Review Letters . (i) (ii) nonlin category in the arXiv dataset, that we eventually mapped into the General physicssubﬁeld, actually contains papers of at least an additional subﬁeld, i.e.

Interdisc . For the samereason some of the subﬁelds obtained from the PACS scheme do not have a direct counterpartin the other two datasets. We report the full mappings in Table S2.Another factor that may aﬀect the matching is the presence of speciﬁc biases for each ofthese datasets, which are captured by comparing it with our new data-driven reconstructedphysics dataset. For instance, the arXiv, ﬁrst created as a repository for people working on HighEnergy Physics, shows a disproportionally high number of

HEP and

Astro publications. This .1 0.2 0.3 0.4 0.5

WoS original R e c on s t r u c t ed ph ys i cs da t a s e t arXiv R e c on s t r u c t ed ph ys i cs da t a s e t a b Figure S4.

Comparison between the distribution of subﬁelds in our reconstructed physicsdataset with the WoS and the arXiv physics categories.

Correlation between distributions ishigh, with values of cosine similarity respectively equal to a b HEP ,and the repository has been largely used by such community.In Table S3, we report the ﬁve non-APS journals with most papers assigned to each subﬁeldby means of label propagation (number of papers in brackets).We note that the Astrophysics literature seems to be relatively disconnected to its APS core,compared to results for the other subﬁelds. As an example, we focus on a well establishedspecialised journal in the area, the

Astrophysical Journal , for which WoS indexes 98,482 papers,only 2,330 of which are labeled. This is because, out of the 3,724,542 outgoing references frompapers published in the

Astrophysical Journal , only . are directed towards the Astro core.Similarly, out of the 4,896,146 incoming citations towards papers published in the

AstrophysicalJournal , only . come from the Astro core. As a reference, we compare these numbers withthose of

Solid State Communications , a specialised journal in the area of Condensed Matter,for which our method assign a subﬁeld to 16,274 out of 35,781 papers. In such case, of the489,625 references and 635,466 citations of the journal, . and . link to the CondMat core. These numbers are roughly ﬁves times higher than those for the

Astrophysical Journal .As a consequence of this disconnection, it is possible that our method it is underestimatingthe number of (possibly specialised) scientists working in Astrophysics. For both journals the oS category Subﬁeld arXiv category / General /Fields

HEP hep-ex, hep-lat, hep-ph, hep-th, math-phNuclear Physics

Nuclear nucl-ex, nucl-thAstrophysics

Astro astro-ph, gr-qcAtomic, Molecular & Chemical Physics

AMO quant-ph/

Classical physics, nlinFluids & Plasmas Physics

Plasma /Condensed Matter Physics

CondMat cond-matMultidisciplinary Physics

Interdisc / Table S2.

Mapping of physics categories from arXiv categories and WoS physics categoriesinto physics subﬁelds. fraction of citations (references) coming from (going towards) the cores associated to the othersubﬁelds is negligible.At last, in Fig. S5 we report the publication proﬁle across subﬁelds for three leading interdis-ciplinary journals. Unsurprisingly, most subﬁelds are represented in all three venues. We notethat the proportions of the diﬀerent subﬁelds is similar to that of the publication of the APSﬂagship journal,

Physical Review Letters . G e n e r a l H E P N u c l e a r A s t r o A M O C l a ss i c a l P l a s m a C o n d M a t I n t e r d i s c P e r c e n t a g e o f p a p e r s Nature G e n e r a l H E P N u c l e a r A s t r o A M O C l a ss i c a l P l a s m a C o n d M a t I n t e r d i s c Science G e n e r a l H E P N u c l e a r A s t r o A M O C l a ss i c a l P l a s m a C o n d M a t I n t e r d i s c PNAS

Figure S5.

Shares of subﬁelds for publications in

Nature , Science and

PNAS . All threeinterdisciplinary journals publish across all subﬁelds of physics. ank General HEP Nuclear

Rank Astro AMO Classical

Rank Plasma CondMat Interdisc

Table S3.

Non-APS journals with most publications with propagated subﬁelds.

While papers are directly associated to subﬁelds through label propagation, we still need toassign physicists to their correct research area. Some physicists, in particular those extremelyproductive, are likely to appear over a whole career as the authors of papers belonging tomultiple subﬁelds, though some of these might not be signiﬁcant. As a consequence, whenassigning the authors to the diﬀerent subﬁelds, we applied a statistical ﬁlter in order to assignonly the subﬁeld(s) on which their engagement is signiﬁcant. In particular, we consider aphysicist as signiﬁcantly working in a subﬁeld only if her share of publications in it, comparedto her production across all subﬁelds, is greater than that of the average scientist. Let usconsider the bipartite weighted network W = { w i α } , where w i α is an integer corresponding tothe number of publications of author i in subﬁeld α . The previous condition can hence byformalised as RCA = w i α ∑ α w i α ∑ i w i α ∑ i α w i α . > . (8)This ﬁlter, known as the Revealed Comparative Advantage (RCA) index, was introduced in 1965in Ref. and has been used previously to ﬁlter bipartite networks, as in Ref. Diﬀerently fromother alternatives, it guarantees that each author is active on at least one ﬁeld. We limit ouranalysis to authors with at least N = publications in our reconstructed physics dataset, inorder to drop all the authors whose contribution to physics is marginal. This set covers 135,877authors.The average distribution w i α of subﬁelds per author is shown in Fig.S6a. In Fig.S6b we showthe average fraction of papers in each subﬁeld for authors statistically validated in a given area.This plot is similar to that of Fig.1c of the main text, but reports more ﬁne-grained informationabout the involvement of physicists in the subﬁelds to which they are assigned. As shown,the share of publication in the subﬁeld of belonging is the highest for authors in Cond Mat , HEP and

Nuclear . Last, in Fig.S6c we report the average career length measured in years, ofphysicists starting publishing in a given year. As expected, the earlier the starting year, thelonger the average time span between the ﬁrst and last publications of a physicist.

Validation:

To test the robustness of our subﬁeld categorisation at the author level, wecompared the numbers of authors working in each subﬁeld with the number of APS members ene r a l H EP N u c l ea r A s t r o A M O C l a ss i c a l P l a s m a C ond M a t I n t e r d i sc F r a c t i on o f pape r s G ene r a l H EP N u c l ea r A s t r o A M O C l a ss i c a l P l a s m a C ond M a t I n t e r d i sc F r a c t i on o f pape r s i n s ub f i e l d St art ing year A v e r age c a r ee r l eng t h ( y ea r s ) ba c Figure S6.

Basic features of authors in our reconstructed physics dataset. a

Averagepublication shares across subﬁelds of a physicist. b For authors validated in a subﬁeld, averagefraction of publications in that subﬁeld. c Average career length measured in years as afunction of the starting year of a career.registered across APS Divisions. In Fig. S7 we report the scatterplot between the two datasets,with a cosine similarity of 0.98. The full mappings between the APS Divisions and our subﬁeldscheme is reported in Table S4.

S5 Author disambiguation

A common problem in the analysis of scientiﬁc careers is that of author disambiguation. Ourcensus of physics is based on merging paper information on subﬁeld and author informationon publications provided by the WoS. Our analysis has been undertaken on the latest availableversion of WoS which, diﬀerently from the previous one, has a built-in author disambiguation,where authors are not classiﬁed by a name but by a speciﬁc author ID. A single author IDis associated to a unique author, and can be associated to several author names when thepublications authored by the same individual report slightly diﬀerent name formats. Similarly,two homonyms, but distinct individuals with the same author name are associated to diﬀerentauthor IDs. Nevertheless, we are aware that a perfect disambiguation is a goal which is impossi-ble to achieve. For such a reason, we decided to test the robustness of our results by replicatingthe analysis reported in the main text after excluding a subset of authors with names which areknown to be particularly hard to disambiguate. In particular, we focused on the most common100 Chinese and 200 Korean names,

9, 10 which correspond to 504,538 distinct author IDs in theWoS dataset, 15,982 of which are present also in our subset of physicists. Overall, results were .0 0.1 0.2 0.3

APS Divisions R e c o n s t r u c t e d p h y s i c s d a t a s e t Figure S7.

Comparison of the fraction of physicists associated to the diﬀerent subﬁeldsand the members of the APS Divisions.

Correlation between the two distributions is high,with a cosine similarity of 0.98.shown to be extremely robust to the elimination of such authors. As an example, we reportin Fig.S8 the starting point of our analysis, i.e. the authors distribution across subﬁelds. Thecosine similarity between the distribution across subﬁelds of the full set and the reduced set ofphysicists, without authors diﬃcult to disambiguate, is 0.99.It is worth to mention that highly curated data-repositories with very good author disam-biguation is available for some subﬁelds. For instance, the well-known HEP-INSPIRE datasethas an extremely valid author disambiguation, especially needed for ﬁelds where most publica-tions are done by large collaborations. However, it is diﬃcult to map the HEP-INSPIRE authordisambiguation into the built-in WoS author disambiguation. On top of this, we believe thatsuch merge would not add validity to our analysis, as conversely would introduce a bias intothe dataset, where authors publishing in diﬀerent subﬁelds are classiﬁed according to diﬀerentdisambiguation procedures. ubﬁeld APS Divisions

General

Computational Physics, Quantum Information, Gravitation

HEP

Particles & Fields

Nuclear

Nuclear Physics, Physics of Beams

Astro

Astrophysics

AMO

Atomic, Molecular & Optical

Classical

Fluid Dynamics

Plasma

Plasma Physics

CondMat

Condensed Matter Physics, Laser Science, Polymer Physics

Interdisc

Biological Physics, Materials Physics, Chemical Physics

Table S4.

Mapping of physics categories from the APS Divisions into the physics subﬁeldscheme. G e n e r a l H E P N u c l e a r A s t r o A M O C l a ss i c a l P l a s m a C o n d M a t I n t e r d i s c N u m b e r o f p h y s i c i s t s Figure S8.

Testing author disambiguation.

Number of authors working in each subﬁeld:plain color (reduced set of 15,982 authors diﬃcult to disambiguate), faded color (all otherphysicists). The cosine similarity between the distribution across subﬁelds of the full set ofphysicists, and the set without authors hard to disambiguate, is 0.99.

In Fig.1d we map the relation between physics subﬁelds into a network, where nodes representsubﬁelds, and weighted links describe signiﬁcant co-activity between them. Let us consider aset of N physicists, and two subﬁelds α and β with respectively N α and N β physicists. We deﬁnethe co-activity C αβ between the two subﬁelds as the ratio between the number of physicists N αβ working on both subﬁelds α and β , and the expected number ˆ N αβ = ( N α N β ) / N . Startingfrom the link with the highest weight, we plot the minimum number of links needed to havea connected network. All reported links have C > , meaning that only edges with co-activityhigher than what expected at random (given the size of the subﬁelds) are shown.In Fig.2b we show ﬂows of physicists from the subﬁeld(s) of their ﬁrst publication, to thesubﬁeld(s) where their activity is signiﬁcant (RCA>1). Let us consider the number of physicists F α | β working in subﬁeld α who started their career by publishing in subﬁeld β , so that ∑ β F α | β = N α . Subﬁeld β is signiﬁcantly contributing to subﬁeld α only if F α | β / N α is greater than the totalfraction of physicists whose ﬁrst publication is in subﬁeld β (reported in the rectangles on thetop). Only signiﬁcant ﬂows are shown. S7 LHC and the HEP 2010 peak

In Fig.2a we show over the years the relative number of new authors entering each subﬁeld. Wenotice that

HEP is characterised by a large peak in 2010. For this reason we looked at all theﬁrst publications of new

HEP authors in 2010, and searched for the collaborations responsiblefor each paper. We found that of the new

HEP authors in 2010 have a ﬁrst publicationwhich is connected to the opening of LHC, either directly through the ATLAS, CMS and LHCbcollaborations., or indirectly (Ref. of the ALICE collaboration takes advantage of resultsby LHC). These new authors also amount to the of the total number of new physicistsacross subﬁelds, explaining the observed peak for HEP . In Fig. S9 we show the yearly fractionof physicists who published their ﬁrst paper in a new subﬁeld, after removing all new 2010

HEP authors connected to the activities of LHC. As displayed, the peak at 2010 for

HEP disappears. R e l a t i v e g r o w t h r a t e Figure S9.

Relative growth rate of subﬁelds after removing new 2010 HEP authorsconnected to the activities of LHC.

No peak is observed for

HEP authors in 2010.

S8 Chaperone effect

In Fig.3c we computed the number of chaperoned authors across subﬁelds. The Chaperoneeﬀect was originally investigated in Ref. for scientiﬁc venues, measured in terms of scientistsmaking the transition from non-last to last (senior / PI) authors in papers published in a journal.Here, as we are interested in the relations, as well as migration between physics subﬁelds, wefocused on a simpliﬁed version of such chaperone measure c , computing the fraction ofphysicists ﬁrst publishing in a subﬁeld who have as co-authors at least one scientist who hasalready published in the area.Despite being intuitive and close to the variable used in Ref., this measure might not proveadequate in the case of subﬁelds characterised by publication through large-scale collabora-tions. For such a reason, we tested our results against ˜ c , a variant of the chaperone index. Giventhe ﬁrst publication of a scientists in a subﬁeld, ˜ c measures the average fraction of co-authorswho have already published in the area. As shown in Fig.S10, in the case of our data c and ˜ c arevery highly correlated, with a cosine similarity of . . .45 0.60 0.75 ̃c c Figure S10.

Comparison between two measures of Chaperone eﬀect.

Scatterplot betweenthe original measure c to quantify the number of chaperoned authors, and the fractionalmeasure ˜ c . The values of two variables across subﬁelds in our dataset are highly correlated. S9 Authors impact and citation rates across subﬁelds

Top authors across subﬁelds have very diﬀerent impact, as shown in Fig.3g. This is mainly aconsequence of diﬀerent productivities, rather than diverse citation patterns across subﬁelds.Indeed, the typical number of papers produced by top authors is very heterogenous acrossphysics communities (Fig.3f ). In contrast, we found that the number of citations per paperis rather constant across subﬁelds: the average is . , with all subﬁelds falling within . standard deviation from this value. For example, papers published in HEP and

Interdisc receiveon average respectively . and . citations, despite the much larger impact of HEP authors.Similar results are obtained for the medians of paper citations across subﬁelds. The averagemedian across physics communities is 9.0, the standard deviation of the median across subﬁeldsis 1.1, and all subﬁelds are at most 1.7 standard deviation away from the global median. Themedian of paper citations for

HEP and

Interdisc are respectively 9 and 11.

S10 The physics Nobel prizes

In Fig.3j we show the distribution of Nobel prizes awarded in physics across subﬁelds. Dataon Nobel prizes in physics are available on the Nobel prize website. We report all awards ince 1985 in order to be consistent with the rest of our data-driven analysis of careers inphysics. All such awards are accompanied by a motivation which allows to assign the crucialdiscovery or stream of research that led to the Nobel prize to one or more physics subﬁelds. Inthe considered time span (1985-2017), 82 scientists were awarded the Nobel prize in physics. eferences Sinatra, R., Deville, P., Szell, M., Wang, D. & Barabási, A.-L. A century of physics.

NaturePhysics , 791 (2015). Deville, P.

Understanding social dynamics through big data (PhD Thesis) (UniversitéCatholique de Louvain, 2015). PACS 2010 regular edition. https://publishing.aip.org/publishing/pacs/pacs-2010-regular-edition . Aps dataset. https://journals.aps.org/datasets . Balassa, B. Trade liberalization and ‘revealed’ comparative advantage.

Manchester School Hidalgo, C. A. & Hausmann, R. The building blocks of economic complexity.

Proceedings ofthe National Academy of Sciences , 10570–10575 (2009). URL . . Aps divisions. . Smalheiser, N. R. & Torvik, V. I. Author name disambiguation.

Annual Review of InformationScience and Technology , 1–43 (2009). URL https://onlinelibrary.wiley.com/doi/abs/10.1002/aris.2009.1440430113 . https://onlinelibrary.wiley.com/doi/pdf/10.1002/aris.2009.1440430113 . Most common chinese surnames. https://en.wikipedia.org/wiki/List_of_common_Chinese_surnames . Most common korean surnames. https://en.wikipedia.org/wiki/List_of_Korean_surnames . Yetkin, T. New physics at atlas and cms experiments with the ﬁrst data.

Nuclear Physics B -Proceedings Supplements , 17 – 26 (2010). URL . The International Workshop onBeyond the Standard Model Physics and LHC Signatures (BSM-LHC). Aamodt, K. et al.

Midrapidity antiproton-to-proton ratio in pp collisons at √ s = . and7 tev measured by the alice experiment. Phys. Rev. Lett. , 072002 (2010). URL https://link.aps.org/doi/10.1103/PhysRevLett.105.072002 . Sekara, V. et al.

The chaperone eﬀect in science.

PNAS, in print (2018).

Physics nobel prizes. ..