[PDF] Coevolution of theoretical and applied research: a case study of graphene research by temporal and geographic analysis

Abstract

As a part of science of science (SciSci) research, the evolution of scientific disciplines has been attracting a great deal of attention recently. This kind of discipline level analysis not only give insights of one particular field but also shed light on general principles of scientific enterprise. In this paper we focus on graphene research, a fast growing field covers both theoretical and applied study. Using co-clustering method, we split graphene literature into two groups and confirm that one group is about theoretical research (T) and another corresponds to applied research (A). We analyze the proportion of T/A and found applied research becomes more and more popular after 2007. Geographical analysis demonstrated that countries have different preference in terms of T/A and they reacted differently to research trend. The interaction between two groups has been analyzed and shows that T extremely relies on T and A heavily relies on A, however the situation is very stable for T but changed markedly for A. No geographic difference is found for the interaction dynamics. Our results give a comprehensive picture of graphene research evolution and also provide a general framework which is able to analyze other disciplines.

Full PDF

CCoevolution of theoretical and applied research: a casestudy of graphene research by temporal and geographicanalysis

Ai Linh Nguyen a,1 , Wenyuan Liu, a,1, ∗ , Siew Ann Cheong a a Division of Physics and Applied Physics, School of Physical and Mathematical Sciences,Nanyang Technological University, 21 Nanyang Link, 637371, Singapore

Abstract

As a part of science of science (SciSci) research, the evolution of scientiﬁc dis-ciplines has been attracting a great deal of attention recently. This kind ofdiscipline level analysis not only give insights of one particular ﬁeld but alsoshed light on general principles of scientiﬁc enterprise. In this paper we focuson graphene research, a fast growing ﬁeld covers both theoretical and appliedstudy. Using co-clustering method, we split graphene literature into two groupsand conﬁrm that one group is about theoretical research (T) and another cor-responds to applied research (A). We analyze the proportion of T/A and foundapplied research becomes more and more popular after 2007. Geographical anal-ysis demonstrated that countries have diﬀerent preference in terms of T/A andthey reacted diﬀerently to research trend. The interaction between two groupshas been analyzed and shows that T extremely relies on T and A heavily relieson A, however the situation is very stable for T but changed markedly for A. Nogeographic diﬀerence is found for the interaction dynamics. Our results give acomprehensive picture of graphene research evolution and also provide a generalframework which is able to analyze other disciplines. ∗ Corresponding author

Email address: [email protected] (Wenyuan Liu,) These authors contributed equally.

Preprint submitted to arXiv February 5, 2021 a r X i v : . [ c s . D L ] F e b . Introduction As an emerging ﬁeld, science of science (SciSci) has attracted a great dealattention recently (Zeng et al., 2017; Fortunato et al., 2018; Wang and Barab´asi,2021). In SciSci, science is treated as a complex system, which include ideas, pa-pers, scientists, funding agencies and, more importantly, the connections amongthem. Using methods from complex system and complex network, researchershave revealed many interest ﬁndings of Science from data. They includes uni-versal citation distribution (Radicchi et al., 2008), scientists’ career dynamics(Petersen et al., 2012), the role of team in science (Wuchty et al., 2007), just toname a few.As a complex system, science ecosystem has obvious hierarchical structure:roughly speaking, science includes physical science and life science, physical sci-ence includes physics, astronomy, chemistry and earth science, physics includesmechanics, electromagnetism, thermodynamics, relativity, quantum physics andso on, electromagnetism itself also have its internal ﬁne structure. At each hi-erarchical level, scientiﬁc elements (people, ideas, papers) are closely connectedwithin disciplines, and loosely connected between disciplines. These naturallyexist modular structure provide convenience to SciSci research since we can con-centrate on particular discipline by assuming the inﬂuence from other disciplinesis negligible. This kind of discipline level analysis not only give insight of oneparticular ﬁeld but also shed light on general principles of scientiﬁc enterprise.Therefore, people’s interest on discipline analysis has been growing rapidly.One of the earliest extensive studies on this subject was done by Bettencourtand Kaur (2011). The authors focus on the evolution of sustainability science,a new discipline emerged in 1980s. By analyzing a large corpus of relevantpublications, they found that sustainability science has been growing explosivelysince its advent in the 1980s. This discipline has an unusual spatial distributionof its contributions: they are widely distributed in both developed counties anddeveloping countries; the collaboration network has strong roots in nationalcapital rather than traditionally more academic cities. To capture the main2hemes that deﬁne the ﬁled, the authors decomposed the corpus into traditionaldisciplines and found they are integrated management of human, social, andecological systems from an engineering and policy perspective. The work inBettencourt and Kaur (2011) also revealed that the uniﬁcation of sustainabilityscience happened around the year 2000 by collaboration network analysis.One recent discipline analysis look to artiﬁcial intelligence (AI) research(Frank et al., 2019). The authors tried to ﬁgure out whether AI research andrelevant social science ﬁelds keep pace with each other. To answer this question,they used citation to track the communication between AI research and otherﬁelds. They analyzed citation ﬂows from 1950 to 2018 and found these ﬂoware neither constant nor symmetric. AI research cited philosophy, geographyand art a lot in its early years, however current AI research cited mathematicsand computer science most strongly. On the other hand, other ﬁelds didn’t citeAI research in proportion to its growing number of publications. There is anattention gap between AI research and social science.Here, we conduct a discipline analysis of graphene research, a relativelyyoung ﬁeld focus on single layer of carbon atoms arranged in a two-dimensionalhoneycomb lattice. Since Novoselov et al. (2004) seminal work, which helpsAndre Geim and Konstantin Novoselov won Nobel Prize in Physics in 2010,the interest in graphene has grown explosively and even led European Com-mission to fund Graphene Flagship with e . Methodology To build the dataset, we chose ‘graphene’ as the topic keyword to searchin the Web of Science Core Collection and obtained bibliographical records of135,617 graphene-related journal papers in August 2018. These records havebeen used in our another paper (Nguyen et al., 2020) and interested readerscan ﬁnd more information about these records there. Web of Science may notcover every journal publication in this topic, however, given the wide coverageof Science Citation Index, most mainstream graphene papers should be includedin our dataset.There are various document types in 135,617 records: articles, proceedingpapers, reviews, meeting abstracts, etc. Since the primary focus of this paper iscoevolution between theoretical and applied branches in graphene research, weonly included research articles with DOI names and publication years for ouranalysis. There are 115,988 remaining records and all analyses in this paper aredone with them if not mentioned otherwise.

To group these papers into clusters by their research topics, we applied ablock diagonal co-clustering algorithm introduced by Ailem et al. (2015, 2016),namely CoClus, to divided papers into a number of non-overlapping clusterswith their characteristic words. It is well known that papers with diﬀerentresearch topics tend to have diﬀerent word frequency features and researchershave demonstrated that Coclus algorithm can eﬀectively co-cluster document-word matrices (Ailem et al., 2016). In this paper, we assume (a) the linguisticcontent inside the title and abstract is suﬃcient to tell the topic in each article,and (b) words that appear less than 0.01% of all records have insigniﬁcantimpact on the clustering process. Generally speaking, CoClus algorithm aimsto partition the object set I of size P and corresponding attribute set J of sizeN into g non-overlapping clusters with high in-cluster density and low cross-cluster density. In our case, the goal is to split our data collection (papers and5ssociated words) into g groups and each group has a set of papers and a set ofwords. These words are used more frequently in the papers belong to the samegroup and less frequently in the papers belong to other groups.Firstly, every article is represented by a N-dimensions vector a i : a i = ( a w , a w , ..., a w N ) , (1)where N is the number of feature words we considered in this study and a w j isthe count of word w j in article a i . By concatenating all paper vectors together,we constructed a paper-word matrix A: A =  a . . .a P  =  a w . . . a w N ... . . . ... a Pw . . . a Pw N  , (2)where a Pw N represents the total number of word w N in the title and abstract ofarticle P . Then, the algorithm tries to cluster matrices eﬀectively by introducinga block seriation C : C = { c pn } p =1 ,...,Pn =1 ,...,N , (3)where c pn = 1 if object p and attribute n belong to the same cluster and c pn = 0otherwise. Ailem et al. (2015) introduced a reformulated modularity as: Q ( A, C ) = 1 (cid:80) p,n a pn P (cid:88) p =1 N (cid:88) n =1 (cid:16) a pn − (cid:80) n a pn (cid:80) p a pn (cid:80) p,n a pn (cid:17) c pn . (4)And it turns out that partition with high in-cluster density and low cross-clusterdensity is equivalent to high Q and the C that returns largest Q is the best par-tition we are looking for. The reformulated modularity has a linear dependenceon C , therefore the co-clustering task can now be regarded as an integer lin-ear programming problem. A python package CoClust (Role et al., 2019) wasused to ﬁnd the partition C that returns the largest reformulated modularity Q . Each paper or word has a cluster label g in C and we can easily constructany cluster by grouping papers or words with the same label g . This is the mainidea of CoClus algorithm and we refer the interested readers to (Ailem et al.,2015, 2016). 6 .3. Keywords by Z-score Co-clustering method can help us divide papers and words into severalgroups. However, it is not a panacea for all research problems. Apart from thetechnical perspective, the result of co-clustering heavily depends on how youabstract your research problem into mathematical form. If the abstraction isnot reasonable, then co-clustering may not return meaningful results. Thereforeit is very important to validate co-clustering result from diﬀerent perspectives.In this study, we built a keyword list for each cluster. If the co-clustering workswell, then one keyword list should have more “theoretical” words and anothershould have more “applied” words and this can be easily checked by anyonewith basic science training. Beyond the validation, these keywords also give usa comprehensible overview of graphene research.There are many ways to extract keywords from corpus (Berry and Kogan,2010). The most straightforward option for us is to count all words in each clus-ter and pick most frequent words as keywords. However this naive methodmay take general words like “research”, “study”, “found”, “result” or even“graphene” which are extensively used in most papers. Such words are notinformative as keywords for graphene research topics. Therefore, we use statis-tical signiﬁcance of word occurrence to measure the correlation between wordsand clusters. The words with strong correlation are the keywords for that clus-ter.To measure the correlation between word w j and cluster I k (with N k papers),we ﬁrst count the number of papers in I k that contain w j in their titles andabstracts, refer to it as µ w j ,k . We also measure frequency of w j in the wholedata collection and denote it as P w j . By assuming the distribution of µ w j ,k follows a binomial function, the expected mean and variance of the number ofrecords in I k that contain w j are: µ w j ,k = N k · P w j , (5) σ w j ,k = N k · P w j · (1 − P w j ) . (6)7hen, the z-score for word w j can be deﬁned as: z w j = µ w j ,k − µ w j ,k σ w j ,k . (7)Apparently, a word that appears more often in a cluster rather than any otherswill have a high z-score in that cluster. On the other hand, words with highz-score do not necessarily appear often, as long as its frequency is higher thanexpected value. The top z-score words should be able to illustrate the topic ofeach community and we use them as keywords for each cluster. As a high competitive ﬁeld, many regions keep a important position forgraphene when they make their research programme. For example, EuropeanCommission fund Graphene Flagship with e £

38 million (homepage,2021b) and South Korea government announced the “Technology Roadmap forPromoting Commercialization of graphene (2015–2020)” in 2015 (). Since re-gions have diﬀerent research traditions, resource and targets, it is not surprisethat they may has diﬀerent aims and preferences in graphene research. To exam-ine this hypothesis, we counted authors’ aﬃliations of each paper and calculatedregional credit accordingly. More speciﬁcally, if all authors of paper P only haveaﬃliations in region R , then R has full credit for this paper P , that is 1. On theother hand, if paper P is result of international collaboration, we ﬁrst split thecredit among all authors evenly. Then, for each author, we split his/her creditamong all his/her aﬃliations uniformly. Finally the credits associated with eachaﬃliation are added together to get regional credits to that paper. They can befractions for regions. For instance, paper P has three authors A , A and A . A has two aﬃliations, one in country C and the other in country C . A hasonly one aﬃliation in country C . A has two aﬃliations both in C . Then C has credit , C has credit = + and C has credit .8 .5. T/A dependency After graphene research literature be divided into theoretical and appliedbranches, a question emerges naturally: how did theoretical branch and appliedbranch inﬂuence each other and shape current graphene research? Althoughthe existence of interplay between them is apparent, the extend and strength ofinteraction are far from obvious. Generally, theoretical research provides guide-line for applied research and applied research contributes proving ground fortheoretical research . Meanwhile, theoretical research inspires followed discus-sion of theoretical questions and applied research stimulate subsequent appliedstudy for the common interest. Given those complicated interactions, a quanti-tative method is needed to measure the inﬂuence between theoretical researchand applied research. Only then we can answer that question from our dataset.In this study, we use citation to capture inﬂuence: paper A cites only the-oretical papers means only theoretical research inﬂuence A, paper B cites onlyapplied papers means only applied research inﬂuence B, paper C cites theoret-ical and applied papers means both theoretical and applied research inﬂuenceC. At ﬁrst glance, it is tempting to simply use proportion of reference to mea-sure inﬂuence: if paper D cites 3 theoretical papers and 7 applied papers, thentheoretical research has 30% inﬂuence and applied research has 70% inﬂuenceover paper D. However, this method oversimpliﬁes the process of knowledgeaccumulation: the 3 theoretical papers in D’s references may cite some appliedpapers and the 7 applied papers in D’s references may heavily depend on earlytheoretical works. Just counting references will miss this information, therefore,can not accurately reﬂect the interaction between T and A.Inspired by persistent inﬂuence in della Briotta Parolo et al. (2020), weintroduce a dependency factor pair ( D nT , D nA ) of paper n to describe its depen-dency on theoretical research ( D nT ) and applied research ( D nA ). A paper witha pair (0 . , .

3) means it receives 70% inﬂuence from theoretical research and30% inﬂuence from applied research. ( D nT , D nA ) satisﬁes following conditions:0 ≤ D T ≤

1, 0 ≤ D A ≤

1, and D T + D A = 1 for obvious reason. To avoid theoversimpliﬁcation mentioned in previous paragraph, we split whole dependency9nto two parts: direct dependency ( d T/A ) and indirect dependency ( i T/A ). Thedirect dependency describes paper’s reference proportion of T and A, while theindirect dependency capture those references’ dependency. The direct depen-dency is very straightforward: for any paper, we only need to check its reference:if x % of them are theoretical papers, then d T = x % , d A = 1 − x %. We use in-direct dependency to reﬂect explicit inﬂuence: paper A may only cites appliedpapers, however those applied papers cite many theoretical papers, thereforepaper A beneﬁts from theoretical research indirectly and this inﬂuence is cap-tured by i T/A . For paper n , its indirect dependency is deﬁned as the average ofall its references’ dependency factor, that is to say, i nT = 1 (cid:107) ref s of n (cid:107) (cid:88) m ∈ refs of n D mT , (8a) i nA = 1 (cid:107) ref s of n (cid:107) (cid:88) m ∈ refs of n D mA . (8b)By combining the direct and indirect dependency, our deﬁnition can reﬂectpaper’s reliance on theoretical and applied research more accurately and com-prehensively.After obtaining ( d nT , d nA ) and ( i nT , i nA ), the direct and indirect dependencyof a paper n on T and A respectively, the overall dependency on T and A, D nT and D nA can be expressed in the following way: D nT = r · d nT + (1 − r ) · i nT , (9a) D nA = r · d nA + (1 − r ) · i nA , (9b)where r is the control parameter determining the mix ratio of direct and in-direct dependency with constrain 0 ≤ r ≤

1. By setting r = 1, dependencyfactor will reduce to only direct dependency, that is to say dependency factorwill only reﬂect paper’s reference proportion in T/A. On the other hand, depen-dency factor will reduce to only indirect dependency with r = 0 and referenceproportion in T/A will not have explicit eﬀect. These extreme cases are theoversimpliﬁcation problem we have discussed in previous paragraphs. Throughintroducing r , we are able to get over these oversimpliﬁcation without loss of10 igure 1: Illustration of dependency factor in citation network. In all panels, we plot thesame citation network where nodes are papers and each edge represents a citation. The arrowcorresponds the ﬂow of inﬂuence, i.e., from a cited paper to a citing paper. The square nodesare theoretical papers and the circle nodes are applied papers. All nodes are colored accordingto their D T values, which are calculated using Eq. 9. Root papers are left blank as their D T are undeﬁned. The control parameter r is set as 0 in (a), 0.5 in (b) and 1 in (c). generality. The eﬀect of r is shown in Fig. 1. We discussed the numerical eﬀectof r is in SI and its value is set to 0 .

3. Results and discussion

We start our study by identifying theoretical and applied papers in ourdata collection. Given the complexity of research, this dichotomy may miss alittle information since some papers are hard to be categorized. However, thisclassiﬁcation is well accepted by graphene research community and our resultsare in good agreement with this convention. Dividing graphene literature intotwo branches can capture the most signiﬁcant heterogeneity inside graphene11esearch. Therefore, we use theoretical/applied dichotomy through this paperand call them T and A for short.It is well known that word frequency changes signiﬁcantly from ﬁeld to ﬁeld.We assume theoretical graphene papers tend to use a set of words frequentlywhile applied graphene papers tend to use another set of words frequently andthese two sets of words are distinct. Based on this assumption, we ﬁrst countedall words in titles and abstracts of 115,988 papers through standard naturallanguage processing techniques (tokenization, stopword ﬁltering, stemming andso on). The words with frequency larger than 0.01% are selected as featurewords and we have 12328 of them. Every article is represented by a 12328-dimensions vector ( a w , a w , . . . , a w ) and a wj is the count of word w j in thatarticle. Combining all paper vectors together, we constructed a 115988 × k ﬁrstto ﬁnd best partition with this k . Normally, the number of clusters is unknownat the beginning. The common protocol is to repeat this process with diﬀerent k and choosing k with highest modularity as the result. We run CoClus algorithmwith 2 ≤ k ≤ k = 4 and k = 6,see supplementary information for more details.However, the cluster structure in paper-word network is quite fuzzy. Moreeﬀort is needed to get the reasonable partition. By comparing best partitionsunder diﬀerent k , we noticed they share a common feature: one group repeat-edly occurred in most partitions while other groups are not stable. It suggeststhat there exist a distinct boundary between that group and others while otherdetected structures are more or less the “overﬁtting”. To show the stability ofthis ”hyperstructure”, we compare the partitions with two highest modularity,namely, k = 4 and k = 6. In case of k = 4, We named the stable group as Iand other groups as II, III, IV for convenience. For the same reason, we namedthe stable group as 1 and other groups as 2, 3, 4, 5, 6 in case of k = 6. Sincethe sum of group II, III and IV is the complement of group I, we call it groupI’ and also call the sum of group 2, 3, 4, 5, 6 as group 1’. We ﬁnd group I has123323 papers, group 1 has 42798 papers and they have 40406 papers in common;group I’ has 72665 papers, group 1’ has 73190 papers and they have 70293 incommon. The situations are very similar for other k values. Therefore, to avoidthe “overﬁtting”, we use the “hyperstructure” with k = 4, namely group I andI’ as the partition result.Using the keyword extraction method, we found the stable group is abouttheoretical research and we call it T. Other three groups under k = 4 are (1)synthesis and functionalization, (2) supercapacitor, and (3) sensor. That alsoexplain the fuzzy boundary between them since they are closer to each otherthan they are from theoretical research. Since they are application-orientedresearch comparing with T, we merged them into one cluster as A. There are72665 papers, 8162 words in group A and 43323 papers, 4166 words in groupT. We visualized the partition in Fig. 2: paper vectors in T are colored blueand paper vectors in A are colored red. We also sorted columns to make ﬁrst4166 columns represent words in group T and remaining 8162 columns representwords in group A. As illustrated by Fig. 2, it is apparent that ﬁrst 4166 wordsare used more frequently in group T and other words are used more frequentlyin group A since these areas are darker than adjacent blocks. And this is exactlythe aim of co-clustering: high in-cluster density and low cross-cluster density.It is worth noting that there are three subclusters inside A (these blocks aredarker than other red area). They are more speciﬁc topics (1) synthesis andfunctionalization, (2) supercapacitor, and (3) sensor, respectively.To validate the clustering result and have an intuitive picture of researchtopic, we built a keyword list for each cluster. To avoid statistical ﬂuctuations,we only considered the words those are in top 2% sorted by frequency. Theresults are not sensitive to the quantile we chosen here as long as we droppedrare words (large ﬂuctuations). These keywords are ranked by their z-scores andtop 25 keywords for T and A are shown in Fig. 3. The lists are very informa-tive: list A covers hot terms in applied graphene research, like electrochemical,cycles, supercapacitors, batteries, electrode, lithium and so on; list T covers keyconcepts in theoretical graphene research, such as Dirac, gap, spin, calculations,13 igure 2: Heatmap for word distribution in our data collection. There are 115988 rows and12328 columns. Each row represents a paper and each column represent a word. A ﬁlledblock means that word appears in corresponding paper, otherwise we leave it empty. For thepurpose of comparison, theoretical papers are colored blue and applied papers are colored red. band, point, states and so on. Therefore, we can conclude conﬁdentially thatco-clustering method successfully divide graphene research literature into theo-retical branch and applied branch. We also give two more extensive word cloudsfor T and A in SI.Based on the co-clustering result, we ﬁrst analyze the proportion of T andA. Among all graphene research papers, 62.6% belong to group A and 37.4%belong to group T. It suggests that graphene research attract attention fromboth theoretical and applied perspectives and both have made indispensablecontribution to this ﬁeld. Furthermore, this ratio is not constant over time. Asshown in Fig. 4, the proportion of T increased from 70% to around 90% during2004-2007, then gradually decreased to lower than 30% in 2017. (Our tempo-ral analysis focus on papers since 2004 because that is the year graphene gotglobal attention and papers about graphene before 2004 are rare, only about0.5% in our dataset.) These curves indicate that at the early stage of grapheneresearch, theoretical branch played a dominant role and gained even more popu-14 igure 3: Top keywords by z-score in group A and T. Each bar’s length is proportional to theword frequency in that group and color is based on z-score. larity until 2007. After that applied branch grown relatively faster and becamethe majority after 2012 and this trend kept until 2017. The reason behinds thisprocess is not clear. Our speculation is that after Novoselov et al. (2004) sem-inal paper, graphene research attract a great deal of attention. At that time,this area was still in it’s infancy and many theoretical questions remained tobe answered, while preparation of graphene was diﬃcult and expensive, appliedresearch is only limited to few labs. So researchers published more theoreticalpapers than applied papers. As time elapsed, people gained more understand-ing of graphene, low-hanging fruit has been picked and theoretical questionsbecame more diﬃcult and time-consuming. On the contrary, technical advance-ment make preparation of graphene easier. It lowered research barrier andallowed more scientists joined the applied research. Also, more understandingof graphene inspires more application scenarios, which motivate more appliedresearch. All of these factors together shape the curves in Fig. 4. Althoughthis hypothesis remains to be validated, our ﬁnding that proportion of groupA has increased steadily since 2007 provide a big picture of graphene researchecosystem for researchers, companies and funding agencies.15 igure 4: The yearly percentage of T and A from 2004 to 2017. Beside temporal evolution, we also studied geographic distribution of grapheneresearch. As a highly actively ﬁeld with enormous economic potential, it attractsscientists and engineers all over the world. Aﬀected by tradition, manpower andfunding policy, regions may have diﬀerent aims and preferences in graphene re-search. To validate this conjecture, we calculated regional credit using methodin Sec. 2.4. By summing all papers’ credit distribution, we are able to mea-sure each region’s contribution in the whole ﬁeld. As shown in Fig. 5, MainlandChina is the topmost player in both theoretical and applied research, with 22.4%share in T and 51.8% share in A. That means Mainland China’s contributionin applied research is even more than the sum of all other regions. The UnitedStates is also a big player with 18.9% share in T and 7.5% share in A. In contrastto Mainland China, US has more share in T than A. It suggests that MainlandChina is more focus on applied graphene research while The United States takesa more balanced position. Furthermore, the composition of Fig. 5 (a) and (b)reveal a subtle diﬀerence between theoretical and applied research: there are 13regions with at least 2% contribution in T while only 7 such regions in A. Itsuggests that applied research is less geographically diverse than theoretical re-search. The reason behinds is complicated, may due to economic factor, funding16 igure 5: The top regions of contribution in T and A. Only regions with at least 2% areshown. See Appendix A for region codes. policy, research culture and so on. We leave it for future research.So far we have studied the graphene research temporarily and geographically.If this two dimensions are combined together, we are able to analyze evolutionof graphene research in particular regions. More concretely, for a given regionin a given year we can calculate its credit in T ( C T ) and A ( C A ) by method inSec. 2.4 using publications only in that year. Then the yearly proportion of Tand A of that region in that year can be calculated as C T C T + C A and C A C T + C A . Andwe plot yearly proportion of T/A for six regions in Fig. 6. These six regions haveat least 2% share in both theoretical and applied research (see Fig. 5), therefore,are considered as top players in graphene research. The curves in Fig. 6 can bethought of as regional version or components of Fig. 4 and it shows that all sixregions follow the same general trend in Fig. 4: gradually decrease of theoreticalresearch and increase of applied research. However, that shift occurred slower inThe United States than in other regions. It suggests that research communityin The United States still put considerable resource in theoretical research. Thisﬁnding is very important to understand graphene research competition amongregions and make funding policy.Like most, if not all, science research ﬁelds, both theoretical and applied17 igure 6: The yearly proportion of T and A in Mainland China, The United States, SouthKorea, India, Iran and Singapore from 2004 to 2017. Curves break when that region has notpaper that year. branches are indispensable parts of graphene research. Each of them has its ownmission and focus. At the same time, they both rely on the communication witheach other: theoretical research gets feedback from applied research, applied re-search receives guideline from theoretical research. This inside coevolution isextremely important for any science research ﬁelds. If this coevolution mech-anism does not work well, that research ﬁeld will experience certain diﬃcultyto move forward, like Aristotelian Physics or science in ancient non-westerncivilizations. Given such importance, we quantiﬁed the inside coevolution ingraphene research in terms of interplay between T and A. The dependency fac-tors were computed for all papers, then the average values were calculated forpapers in group T and A respectively. As shown in Table 1, both T and A relymore on itself than others. However, the diﬀerence is obvious: T is criticallydepend on T (90%) while A is relatively rely on A (69%). This result illustratesa remarkable diﬀerence between T and A: theoretical research is mostly drivenby other theoretical research, on the other hand applied research pays fairly highattention to theoretical research. On possible interpretation of this diﬀerenceis that: theoretical graphene research have achieved signiﬁcant progress and its18 able 1: The average dependency of T and A groups on T and A respectively. Group Dependency Theoretical (T) Applied (A)Theoretical (T) 0.90 0.31Applied (A) 0.10 0.69current eﬀort is beyond existing technology, while applied research beneﬁts alot from theoretical research and as the result A cites T, directly and indirectly,a lot. More evidences are need to verify this explanation, however, given thefact that in history of science theory is ahead of application at most of time ,we believe it is a plausible explanation.The numbers in Table 1 is aggregated results of papers published in all years.Even though they provides many insights for us, the temporal information islost. To get a better understanding of coevolution process, we calculated theaverage values of T/A dependency for T/A papers in each year and plot themin Fig. 7. As shown in that ﬁgure, dependency curves behavior signiﬁcantlydiﬀerent between theoretical and applied group: the curves in T is very stable,while the curves in A underwent dramatic changes during that time. From2004 to 2006, A’s dependency on T increased from 0.3 to 0.7 and graduallydecreased afterward. That is to say applied graphene research is mainly drivenby theoretical research at early stage and become more motivated by it self astime goes on. This result suggests that the inside coevolution is not a “staticequilibrium”, but rather a “dynamic process”. Not only proportions of T andA change with time (Fig. 4), the interaction between them also changes. Morework is needed to fully understand the reason behind these changes, and ourﬁnding can serve as the basis for further research.We have already shown the geographically diﬀerence in Fig. 6. Does sucheﬀect also happen in T/A dependency? To answer this questions, we groupedall papers according to authors’ aﬃliation region, and calculated their T/A de-pendency of T/A. We only consider papers with all authors in the same region19 igure 7: The yearly average dependency of T (top panel) and A (down panel) on T (blue)and A (red) respectively. since it is not clear to attribute dependency to regions with international collab-orated papers. However, we found that single region papers are very similar tointernational collaborated papers in terms of dependency evolution. Therefore,the result here will not change much after international collaborated papers areincluded. Please see SI for more details. The results of topmost six regionsare shown in Fig. 8. Unlike Fig. 6, all regions in Fig. 8 show roughly the samebehavior, even for Mainland China and The United States. This suggests thatthe trend we found in Fig. 7 is a universal phenomenon for graphene researchand is insensitive to geographic factor.

4. Conclusion

Discipline-level analysis give many insights of scientiﬁc enterprise, also oﬀerimportant reference for practical purpose, like career decision, hiring decision,funding policy and so on. Some important works have been done with SciSciparadigm, for example the evolution of sustainability science, the structure andevolution of physics, and the development of artiﬁcial intelligence (Bettencourtand Kaur, 2011; Sinatra et al., 2015; Frank et al., 2019).20 igure 8: The yearly average dependency of T (square) and A (circle) on T (blue) and A (red)in Mainland China, The United States, South Korea, India, Iran and Singapore from 2004 to2017. Curves break when that region has not T/A paper that year.

In this study, we complemented previous studies by investigating evolutionof graphene research in terms of its main components: theoretical branch andapplied branch. Using the co-clustering method we divided the graphene pub-lication collection into two groups. By extracting each group’s keywords, weconﬁrmed that one group is about theoretical research and the other does ap-plied research. Overall, 37.4% of papers belong to T and remaining 62.6% aremembers of A. However, these ratios are not constants over time: the propor-tion of T increased to around 90% in 2007 and gradually decreased afterward,while the proportion of A did just opposite. It suggests that applied researchgrown faster than theoretical research and the attention of graphene researchcommunity shift gradually from theory to application. By analyzing authors’aﬃliations, we computed region credit for every paper and all regions total con-tribution. The distribution of contribution is very diﬀerent in T and A: manyregions made signiﬁcant contribution in T while Mainland China is dominantin A. And the evolution curves also show signiﬁcant diﬀerence among regions.Using the dependency factor we invented, the reliance between theoretical re-21earch and applied research is quantiﬁed. We found that such dependency isasymmetric: theoretical research is extremely inﬂuenced by itself while appliedresearch beneﬁts from T and A in a more balanced way. Such dependencyrelation is very stable for theoretical research while changed signiﬁcantly forapplied research. And we found this phenomenon is insensitive to geographicfactor, which suggests it is a universal process.Although many interest ﬁndings were observed, several important questionsremain to be answer. For instance, graphene papers were classiﬁed either astheoretical research or applied research in this study. However, this dichotomymay fail for some papers since collaboration between theorists and experimen-tists become common nowadays and it is inaccurate to put those paper eitherin T or A. In other words, an overlapping-clusters picture of graphene researchmay be a better description of reality . Future work should take overlappinginto account given the importance of collaboration in modern scientiﬁc enter-prise. In this study all graphene papers are treated with equal importance.This choice simpliﬁes our analysis, but introduces deviation from reality: a fewpapers receive most attention. It would be beneﬁcial to incorporate this factinto our framework and new results may give us a better picture of grapheneresearch.

5. Acknowledgements

This research is supported by the Singapore Ministry of Education AcademicResearch Fund, under the grant number MOE2017-T2-2-075.

6. Author Contributions

Conceived and designed the analysis: Wenyuan Liu.Collected the data: Ai Linh Nguyen.Performed the analysis: Ai Linh Nguyen.Wrote the paper: Ai Linh Nguyen, Wenyuan Liu and Siew Ann Cheong22 . Appendix A. Region code

CN: Mainland China. US: The United States. JP: Japan. DE: Germany.KR: South Korea. IR: Iran. IN: India. RU: Russia. GB: The United Kingdom.FR: France. ES: Spain. IT: Italy. SG: Singapore. TW: Taiwan.

References

Melissa Ailem, Fran¸cois Role, and Mohamed Nadif. Co-clustering document-term matrices by direct maximization of graph modularity. In

Proceedings ofthe 24th ACM International on Conference on Information and KnowledgeManagement , pages 1807–1810, Melbourne Australia, October 2015. ACM.ISBN 9781450337946. doi: 10.1145/2806416.2806639. URL https://dl.acm.org/doi/10.1145/2806416.2806639 .Melissa Ailem, Fran¸cois Role, and Mohamed Nadif. Graph modularity max-imization as an eﬀective method for co-clustering text data.

Knowledge-Based Systems , 109:160–173, October 2016. ISSN 09507051. doi: 10.1016/j.knosys.2016.07.002. URL https://linkinghub.elsevier.com/retrieve/pii/S0950705116302064 .Andreas Barth and Werner Marx. Graphene - A rising star in view of sciento-metrics. arXiv:0808.3320 [cond-mat, physics:physics] , September 2008. URL http://arxiv.org/abs/0808.3320 . arXiv: 0808.3320.Michael W Berry and Jacob Kogan.

Text mining applications and theory .2010. ISBN 9780470689653 9780470749821 9780470689646. URL https://nbn-resolving.org/urn:nbn:de:101:1-201501026976 . OCLC: 719451203.L. M. A. Bettencourt and J. Kaur. Evolution and structure of sustainability sci-ence.

Proceedings of the National Academy of Sciences , 108(49):19540–19545,December 2011. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1102712108.URL .23ietro della Briotta Parolo, Rainer Kujala, Kimmo Kaski, and Mikko Kivel¨a.Tracking the cumulative knowledge spreading in a comprehensive citationnetwork.

Physical Review Research , 2(1):013181, February 2020. ISSN 2643-1564. doi: 10.1103/PhysRevResearch.2.013181. URL https://link.aps.org/doi/10.1103/PhysRevResearch.2.013181 .Santo Fortunato, Carl T. Bergstrom, Katy B¨orner, James A. Evans, Dirk Hel-bing, Staˇsa Milojevi´c, Alexander M. Petersen, Filippo Radicchi, RobertaSinatra, Brian Uzzi, Alessandro Vespignani, Ludo Waltman, Dashun Wang,and Albert-L´aszl´o Barab´asi. Science of science.

Science , 359(6379):eaao0185,March 2018. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.aao0185. URL .Morgan R. Frank, Dashun Wang, Manuel Cebrian, and Iyad Rahwan.The evolution of citation graphs in artiﬁcial intelligence research.

Na-ture Machine Intelligence , 1(2):79–85, February 2019. ISSN 2522-5839.doi: 10.1038/s42256-019-0024-5. URL .Graphene Flagship homepage, 2021a. URL https://graphene-flagship.eu/ .National Graphene Institute homepage, 2021b. URL .Peng Hui Lv, Gui-Fang Wang, Yong Wan, Jia Liu, Qing Liu, and Fei-cheng Ma.Bibliometric trend analysis on global graphene research.

Scientometrics , 88(2):399–419, August 2011. ISSN 1588-2861. doi: 10.1007/s11192-011-0386-x.URL https://doi.org/10.1007/s11192-011-0386-x .Ai Linh Nguyen, Wenyuan Liu, Khiam Aik Khor, Andrea Nanetti, and Siew AnnCheong. The golden eras of graphene science and technology: Biblio-graphic evidences from journal and patent publications.

Journal of Infor-metrics , 14(4):101067, November 2020. ISSN 17511577. doi: 10.1016/j.joi.2020.101067. URL https://linkinghub.elsevier.com/retrieve/pii/S1751157719303542 . 24. S. Novoselov, A. K. Geim, S. V. Morozov, D. Jiang, Y. Zhang, S. V. Dubonos,I. V. Grigorieva, and A. A. Firsov. Electric ﬁeld eﬀect in atomically thincarbon ﬁlms.

Science , 306(5696):666–669, October 2004. ISSN 0036-8075,1095-9203. doi: 10.1126/science.1102896. URL .A. M. Petersen, M. Riccaboni, H. E. Stanley, and F. Pammolli. Persistence anduncertainty in the academic career.

Proceedings of the National Academy ofSciences , 109(14):5213–5218, April 2012. ISSN 0027-8424, 1091-6490. doi:10.1073/pnas.1121429109. URL .F. Radicchi, S. Fortunato, and C. Castellano. Universality of citation dis-tributions: Toward an objective measure of scientiﬁc impact.

Proceed-ings of the National Academy of Sciences , 105(45):17268–17272, November2008. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.0806977105. URL .Fran¸cois Role, Stanislas Morbieu, and Mohamed Nadif. coclust : a python package for co-clustering.

Journal of Statistical Software , 88(7), 2019. ISSN1548-7660. doi: 10.18637/jss.v088.i07. URL .Roberta Sinatra, Pierre Deville, Michael Szell, Dashun Wang, and Albert-L´aszl´oBarab´asi. A century of physics.

Nature Physics , 11(10):791–796, October2015. ISSN 1745-2473, 1745-2481. doi: 10.1038/nphys3494. URL .Dashun Wang and Albert-L´aszl´o Barab´asi.

The science of science .Cambridge University Press, Cambridge, 2021. ISBN 9781108492669.URL .S. Wuchty, B. F. Jones, and B. Uzzi. The increasing dominance of teams inproduction of knowledge.

Science , 316(5827):1036–1039, May 2007. ISSN25036-8075, 1095-9203. doi: 10.1126/science.1136099. URL .An Zeng, Zhesi Shen, Jianlin Zhou, Jinshan Wu, Ying Fan, Yougui Wang, andH. Eugene Stanley. The science of science: From the perspective of complexsystems.

Physics Reports , 714-715:1–73, November 2017. ISSN 03701573.doi: 10.1016/j.physrep.2017.10.001. URL https://linkinghub.elsevier.com/retrieve/pii/S0370157317303289https://linkinghub.elsevier.com/retrieve/pii/S0370157317303289